Scheduled ETL Job
ETL Job provides automated scheduling for ETL processes. It combines an ETL Data Source and ETL Transform Map into a scheduled job that runs extraction and transformation automatically.
Field Reference
Job Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Display Name | Data | Yes | Human-readable name for this job |
| Application | Link | Yes | My Application this job belongs to |
| Active | Check | No | Enable/disable job execution (default: checked) |
ETL Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Data Source | Link | Yes | ETL Data Source to extract data from |
| Transform Map | Link | Yes | ETL Transform Map to apply during transformation |
Scheduling
| Field | Type | Required | Description |
|---|---|---|---|
| Event Frequency | Select | Yes | How often to run (All, Hourly, Daily, Weekly, Monthly, Yearly, Cron, Custom) |
| Time | Time | No | What time of day to run (default: 00:00:00) |
Custom Frequency Fields
Available when Event Frequency = "Custom"
| Field | Type | Description |
|---|---|---|
| Custom Frequency | Select | Daily, Weekly, Monthly, or Yearly |
| Weekday | Select | Day of week (for Weekly) |
| Day | Select | Day of month (for Monthly/Yearly) |
| Month | Select | Month (for Yearly) |
Cron Schedule
Available when Event Frequency = "Cron"
| Field | Type | Description |
|---|---|---|
| Cron Format | Data | Standard cron expression (e.g., "0 2 * * *" for daily at 2 AM) |
How ETL Jobs Work
When you save an ETL Job, the system automatically:
- Creates a Server Script with scheduler event type
Generates execution code that calls the ETL pipeline:
mantera.run_etl_job("data_source_name", "transform_map_name")
Configures scheduling based on Event Frequency settings
- Manages the Server Script lifecycle (update/delete when job changes)
ETL Pipeline Execution
Each scheduled execution:
- Extracts data from the configured Data Source
- Creates Import Batch with extracted data in chunks
- Applies Transform Map to create/update target records
- Logs results in Transform Run with detailed events
- Runs in background to avoid blocking the system
Actions
Run ETL Job
Manually triggers job execution for testing purposes. The job runs in the background queue with a 1-hour timeout.
Frequency Options
Standard Frequencies
- All: Runs every minute (use with caution)
- Hourly: Every hour at minute 0
- Daily: Once per day at specified time
- Weekly: Once per week on Sunday at specified time
- Monthly: Once per month on the 1st at specified time
- Yearly: Once per year on January 1st at specified time
Long-Running Variants
- Hourly Long, Daily Long, etc.: Same schedule but with extended timeout for large datasets
Custom Scheduling
Provides more granular control:
- Daily: Specify exact time
- Weekly: Choose day of week and time
- Monthly: Choose day of month and time
- Yearly: Choose month, day, and time
Cron Expressions
Full flexibility using standard cron syntax:
* * * * *
┬ ┬ ┬ ┬ ┬
│ │ │ │ └ day of week (0-6) (0 is Sunday)
│ │ │ └──── month (1-12)
│ │ └─────── day of month (1-31)
│ └────────── hour (0-23)
└───────────── minute (0-59)
Example Configurations
Daily Customer Import
Display Name: Daily Customer Sync
Data Source: Customer Database Extract
Transform Map: Customer Import Mapping
Event Frequency: Daily
Time: 02:00:00
Weekly Sales Report
Display Name: Weekly Sales Data
Data Source: Sales API Endpoint
Transform Map: Sales Record Transform
Event Frequency: Custom
Custom Frequency: Weekly
Weekday: Monday
Time: 08:00:00
Complex Cron Schedule
Display Name: Business Hours API Sync
Data Source: External CRM API
Transform Map: Contact Data Transform
Event Frequency: Cron
Cron Format: 0 9,13,17 * * 1-5
(Runs at 9 AM, 1 PM, and 5 PM on weekdays)
Monitoring Jobs
Job Status
- Active jobs appear in the Server Script list with matching names
- Inactive jobs have disabled Server Scripts
- Job execution logs appear in Scheduled Job Log DocType
Troubleshooting
- Check Error Log for background job failures
- Review ETL Transform Run records for processing details
- Monitor ETL Import Batch status for extraction issues
- Verify Data Connection is active and accessible
Best Practices
Scheduling
- Avoid overlapping executions for the same data source
- Schedule during low-traffic hours for better performance
- Use appropriate timeouts for data volume size
- Consider time zones when setting execution times
Error Handling
- Monitor job execution logs regularly
- Set up email notifications for critical failures
- Test with small datasets before full production runs
- Have rollback procedures for data quality issues
Performance
- Optimize chunk sizes based on data volume
- Index coalesce fields in target DocTypes
- Use incremental extraction where possible
- Archive old Import Batches and Transform Runs periodically
Security Considerations
- ETL Jobs inherit permissions from the Server Script system
- Background execution runs with system-level privileges
- Ensure proper data validation in Transform Maps
- Monitor for unauthorized schedule changes
- Use SSL connections for all external data sources
Related DocTypes
- ETL Data Source: Defines what data to extract
- ETL Transform Map: Defines how to transform and load data
- Server Script: Automatically generated scheduler script
- ETL Import Batch: Created during each job execution
- ETL Transform Run: Logs each transformation operation