Data Source
ETL Data Source defines what data to extract from an ETL Data Connection. It specifies the source (table or query) and extraction parameters like chunking and filtering.
Field Reference
Basic Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Source Name | Data | Yes | Unique identifier for this data source |
| Source Kind | Select | Yes | Type of data source (SQL Table, SQL Query, REST) |
| Data Connection | Link | Yes | ETL Data Connection to use |
| Application | Link | Yes | My Application this source belongs to |
| Active | Check | No | Enable/disable this source (default: checked) |
SQL Table Configuration
Available when Source Kind = "SQL Table"
| Field | Type | Required | Description |
|---|---|---|---|
| Schema Name | Data | No | Database schema containing the table |
| Table Name | Data | Yes | Name of the table to extract |
| Select Fields | Long Text | No | Comma-separated field list (default: *) |
| Distinct | Check | No | Add DISTINCT clause to query |
| Where Clause | Code | No | SQL WHERE conditions without "WHERE" keyword |
| Order By | Data | No | SQL ORDER BY clause without "ORDER BY" keyword |
| Limit Rows | Int | No | Maximum number of rows to extract |
SQL Query Configuration
Available when Source Kind = "SQL Query"
| Field | Type | Required | Description |
|---|---|---|---|
| SQL Query | Code | Yes | Complete SELECT statement to execute |
Security Note: Only SELECT statements are allowed. The system validates and blocks INSERT, UPDATE, DELETE, and other dangerous operations.
Runtime Configuration
| Field | Type | Required | Description |
|---|---|---|---|
| Chunk Size | Int | No | Number of records per chunk (default: 1000) |
| Max Rows | Int | No | Maximum total records to extract |
SQL Query Generation
For SQL Table sources, the system generates queries in this format:
SELECT [DISTINCT] {select_fields}
FROM {schema_name}.{table_name}
WHERE {where_clause}
ORDER BY {order_by}
LIMIT {limit_rows}
Actions
Preview Data
Extracts a small sample (10 records) to verify the source configuration without creating a full import batch.
Extract to Staging
Begins the full extraction process:
- Creates an ETL Import Batch
- Extracts data in chunks based on Chunk Size
- Stores raw data as JSONL in ETL Import Chunks
- Returns the batch name for transformation
Usage Examples
SQL Table Example
- Source Kind: SQL Table
- Schema Name: public
- Table Name: customers
- Select Fields: customer_id, name, email, created_date
- Where Clause:
active = true AND created_date >= '2024-01-01' - Order By: created_date DESC
- Chunk Size: 500
SQL Query Example
- Source Kind: SQL Query
- SQL Query:
SELECT c.customer_id, c.name, c.email, COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.active = true
GROUP BY c.customer_id, c.name, c.email
HAVING COUNT(o.order_id) > 0
Performance Considerations
- Chunk Size: Balance between memory usage and database round trips
- Max Rows: Use for testing to limit data volume
- Indexing: Ensure ORDER BY fields are indexed for better performance
- Where Clauses: Add selective filters to reduce data volume