Data Source

ETL Data Source defines what data to extract from an ETL Data Connection. It specifies the source (table or query) and extraction parameters like chunking and filtering.

Field Reference

Basic Configuration

Field Type Required Description
Source Name Data Yes Unique identifier for this data source
Source Kind Select Yes Type of data source (SQL Table, SQL Query, REST)
Data Connection Link Yes ETL Data Connection to use
Application Link Yes My Application this source belongs to
Active Check No Enable/disable this source (default: checked)

SQL Table Configuration

Available when Source Kind = "SQL Table"

Field Type Required Description
Schema Name Data No Database schema containing the table
Table Name Data Yes Name of the table to extract
Select Fields Long Text No Comma-separated field list (default: *)
Distinct Check No Add DISTINCT clause to query
Where Clause Code No SQL WHERE conditions without "WHERE" keyword
Order By Data No SQL ORDER BY clause without "ORDER BY" keyword
Limit Rows Int No Maximum number of rows to extract

SQL Query Configuration

Available when Source Kind = "SQL Query"

Field Type Required Description
SQL Query Code Yes Complete SELECT statement to execute

Security Note: Only SELECT statements are allowed. The system validates and blocks INSERT, UPDATE, DELETE, and other dangerous operations.

Runtime Configuration

Field Type Required Description
Chunk Size Int No Number of records per chunk (default: 1000)
Max Rows Int No Maximum total records to extract

SQL Query Generation

For SQL Table sources, the system generates queries in this format:

SELECT [DISTINCT] {select_fields}
FROM {schema_name}.{table_name}
WHERE {where_clause}
ORDER BY {order_by}
LIMIT {limit_rows}


Actions

Preview Data

Extracts a small sample (10 records) to verify the source configuration without creating a full import batch.

Extract to Staging

Begins the full extraction process:

  • Creates an ETL Import Batch
  • Extracts data in chunks based on Chunk Size
  • Stores raw data as JSONL in ETL Import Chunks
  • Returns the batch name for transformation

Usage Examples

SQL Table Example

  • Source Kind: SQL Table
  • Schema Name: public
  • Table Name: customers
  • Select Fields: customer_id, name, email, created_date
  • Where Clause: active = true AND created_date >= '2024-01-01'
  • Order By: created_date DESC
  • Chunk Size: 500

SQL Query Example

  • Source Kind: SQL Query
  • SQL Query:
SELECT c.customer_id, c.name, c.email, COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.active = true
GROUP BY c.customer_id, c.name, c.email
HAVING COUNT(o.order_id) > 0

Performance Considerations

  • Chunk Size: Balance between memory usage and database round trips
  • Max Rows: Use for testing to limit data volume
  • Indexing: Ensure ORDER BY fields are indexed for better performance
  • Where Clauses: Add selective filters to reduce data volume
Discard
Save

On this page

Review Changes ← Back to Content
Message Status Space Raised By Last update on