Scheduled ETL Job

ETL Job provides automated scheduling for ETL processes. It combines an ETL Data Source and ETL Transform Map into a scheduled job that runs extraction and transformation automatically.

Field Reference

Job Configuration

Field Type Required Description
Display Name Data Yes Human-readable name for this job
Application Link Yes My Application this job belongs to
Active Check No Enable/disable job execution (default: checked)

ETL Configuration

Field Type Required Description
Data Source Link Yes ETL Data Source to extract data from
Transform Map Link Yes ETL Transform Map to apply during transformation

Scheduling

Field Type Required Description
Event Frequency Select Yes How often to run (All, Hourly, Daily, Weekly, Monthly, Yearly, Cron, Custom)
Time Time No What time of day to run (default: 00:00:00)

Custom Frequency Fields

Available when Event Frequency = "Custom"

Field Type Description
Custom Frequency Select Daily, Weekly, Monthly, or Yearly
Weekday Select Day of week (for Weekly)
Day Select Day of month (for Monthly/Yearly)
Month Select Month (for Yearly)

Cron Schedule

Available when Event Frequency = "Cron"

Field Type Description
Cron Format Data Standard cron expression (e.g., "0 2 * * *" for daily at 2 AM)

How ETL Jobs Work

When you save an ETL Job, the system automatically:

  1. Creates a Server Script with scheduler event type
  2. Generates execution code that calls the ETL pipeline:

    mantera.run_etl_job("data_source_name", "transform_map_name")

  3. Configures scheduling based on Event Frequency settings

  4. Manages the Server Script lifecycle (update/delete when job changes)

ETL Pipeline Execution

Each scheduled execution:

  1. Extracts data from the configured Data Source
  2. Creates Import Batch with extracted data in chunks
  3. Applies Transform Map to create/update target records
  4. Logs results in Transform Run with detailed events
  5. Runs in background to avoid blocking the system

Actions

Run ETL Job

Manually triggers job execution for testing purposes. The job runs in the background queue with a 1-hour timeout.

Frequency Options

Standard Frequencies

  • All: Runs every minute (use with caution)
  • Hourly: Every hour at minute 0
  • Daily: Once per day at specified time
  • Weekly: Once per week on Sunday at specified time
  • Monthly: Once per month on the 1st at specified time
  • Yearly: Once per year on January 1st at specified time

Long-Running Variants

  • Hourly Long, Daily Long, etc.: Same schedule but with extended timeout for large datasets

Custom Scheduling

Provides more granular control: - Daily: Specify exact time - Weekly: Choose day of week and time - Monthly: Choose day of month and time
- Yearly: Choose month, day, and time

Cron Expressions

Full flexibility using standard cron syntax:

*  *  *  *  *
┬  ┬  ┬  ┬  ┬
│  │  │  │  └ day of week (0-6) (0 is Sunday)
│  │  │  └──── month (1-12)
│  │  └─────── day of month (1-31)
│  └────────── hour (0-23)
└───────────── minute (0-59)

Example Configurations

Daily Customer Import

Display Name: Daily Customer Sync
Data Source: Customer Database Extract  
Transform Map: Customer Import Mapping
Event Frequency: Daily
Time: 02:00:00

Weekly Sales Report

Display Name: Weekly Sales Data
Data Source: Sales API Endpoint
Transform Map: Sales Record Transform
Event Frequency: Custom
Custom Frequency: Weekly
Weekday: Monday
Time: 08:00:00

Complex Cron Schedule

Display Name: Business Hours API Sync
Data Source: External CRM API
Transform Map: Contact Data Transform
Event Frequency: Cron
Cron Format: 0 9,13,17 * * 1-5
(Runs at 9 AM, 1 PM, and 5 PM on weekdays)

Monitoring Jobs

Job Status

  • Active jobs appear in the Server Script list with matching names
  • Inactive jobs have disabled Server Scripts
  • Job execution logs appear in Scheduled Job Log DocType

Troubleshooting

  • Check Error Log for background job failures
  • Review ETL Transform Run records for processing details
  • Monitor ETL Import Batch status for extraction issues
  • Verify Data Connection is active and accessible

Best Practices

Scheduling

  • Avoid overlapping executions for the same data source
  • Schedule during low-traffic hours for better performance
  • Use appropriate timeouts for data volume size
  • Consider time zones when setting execution times

Error Handling

  • Monitor job execution logs regularly
  • Set up email notifications for critical failures
  • Test with small datasets before full production runs
  • Have rollback procedures for data quality issues

Performance

  • Optimize chunk sizes based on data volume
  • Index coalesce fields in target DocTypes
  • Use incremental extraction where possible
  • Archive old Import Batches and Transform Runs periodically

Security Considerations

  • ETL Jobs inherit permissions from the Server Script system
  • Background execution runs with system-level privileges
  • Ensure proper data validation in Transform Maps
  • Monitor for unauthorized schedule changes
  • Use SSL connections for all external data sources
  • ETL Data Source: Defines what data to extract
  • ETL Transform Map: Defines how to transform and load data
  • Server Script: Automatically generated scheduler script
  • ETL Import Batch: Created during each job execution
  • ETL Transform Run: Logs each transformation operation
Discard
Save

On this page

Review Changes ← Back to Content
Message Status Space Raised By Last update on