Import Chunk

ETL Import Chunk stores raw extracted data in manageable segments. Each chunk contains a portion of the total data from an ETL Import Batch, stored as JSONL (JSON Lines) format for efficient processing.

Note: This DocType is system-generated during data extraction. Chunks cannot be created or modified manually.

Field Reference

Chunk Information

Field Type Description
Name Auto System-generated unique identifier
Import Batch Link Parent ETL Import Batch this chunk belongs to
Seq No Int Sequential number of this chunk within the batch
Row Count Int Number of data records in this chunk
Bytes Int Size of raw data in bytes
Checksum Data Data integrity verification hash

Data Storage

Field Type Description
Raw JSONL Long Text Raw extracted data in JSON Lines format
Processed Check Whether this chunk has been processed by transforms
Error Message Small Text Any errors encountered during chunk creation

JSONL Format

Raw data is stored as JSONL (JSON Lines) - one JSON object per line:

{"customer_id": 1001, "name": "Acme Corp", "email": "contact@acme.com"}
{"customer_id": 1002, "name": "Beta LLC", "email": "info@beta.com"}
{"customer_id": 1003, "name": "Gamma Inc", "email": "sales@gamma.com"}

Chunk Processing

During transformation, the system:

  1. Reads each chunk sequentially by Seq No
  2. Processes each JSONL line as a separate record
  3. Applies field mapping and business logic
  4. Creates/updates target DocType records
  5. Logs results in ETL Transform Events

Performance Considerations

Chunk Size Optimization

  • Small chunks (100-500 records): Lower memory usage, more database commits
  • Large chunks (2000-5000 records): Higher memory usage, fewer commits
  • Recommended: 1000 records per chunk for most use cases

Memory Management

  • Chunks are processed sequentially to minimize memory usage
  • Raw JSONL is loaded into memory only during processing
  • Database connections are properly closed after each chunk

Viewing Chunk Data

To examine raw extracted data:

  1. Navigate to ETL Import Batch
  2. Click "View Chunks" to see all chunks for the batch
  3. Open individual chunks to view Raw JSONL content
  4. Use browser search to find specific records within chunks

Troubleshooting

Common Issues

  • Empty chunks: Check source query filters and data availability
  • Large chunk sizes: May cause memory issues during transformation
  • Malformed JSON: Usually indicates source data encoding problems
  • Processing errors: Review ETL Transform Event logs for details

Data Validation

Before transformation, verify: - Chunk row counts match expected data volume - JSONL format is valid (each line is proper JSON) - Required fields are present in source records - Data types match expected formats

  • ETL Import Batch: Parent batch containing multiple chunks
  • ETL Transform Run: Processes chunks during transformation
  • ETL Transform Event: Logs individual chunk processing results
Discard
Save

On this page

Review Changes ← Back to Content
Message Status Space Raised By Last update on