Ingestion: dlt + lambda
Last updated
Last updated
This example demonstrates a serverless data ingestion pipeline that:
Fetches chess data from an external source and processes it using
Writes the data to S3
The pipeline runs as an AWS Lambda function packaged in a Docker container.
AWS Lambda: Executes the ingestion code on demand
Amazon ECR: Stores the Docker container image
Amazon S3: Temporary storage for data files before loading to Snowflake
AWS Secrets Manager: Stores credentials and configuration
Terraform: Provisions and manages all infrastructure
The pipeline follows these steps:
Extraction: DLT extracts data from the source
Transformation: DLT performs basic transformations (typing, normalization)
Loading: DLT loads the data directly to S3
Schema Management: Table schemas are defined in YAML files and managed by the pipeline
For rapid iteration without AWS resources, use DuckDB as the destination:
Create a .env.local
file with:
Run the pipeline locally:
Examine results in the local .duckdb database
To run the lambda with S3 as a temporary destination:
Configure .env.local
with:
Run with the same command:
Once your code is deployed to AWS you can run the lambda with:
For interactive debugging, add this to .vscode/launch.json
:
Snowflake landing table schemas are defined in YAML files in the pipelines/ingest/<source-name>_source_schema.yml
file.
After running the pipeline locally, generate a source schema definition:
This will generate a schema file chess_source_schema.yml
in the pipelines folder to define the Snowflake tables.
For manual deployment:
This process:
Builds the Docker image locally
Pushes it to ECR
Updates the Lambda to use the new image