Ingestion: dlt + lambda
Overview
This example demonstrates a serverless data ingestion pipeline that:
Fetches chess data from an external source and processes it using dlt
Writes the data to S3
The pipeline runs as an AWS Lambda function packaged in a Docker container.

How It Works
Infrastructure Components
AWS Lambda: Executes the ingestion code on demand
Amazon ECR: Stores the Docker container image
Amazon S3: Temporary storage for data files before loading to Snowflake
AWS Secrets Manager: Stores credentials and configuration
Terraform: Provisions and manages all infrastructure
Code Structure
Data Flow Process
The pipeline follows these steps:
Extraction: DLT extracts data from the source
Transformation: DLT performs basic transformations (typing, normalization)
Loading: DLT loads the data directly to S3
Schema Management: Table schemas are defined in YAML files and managed by the pipeline
Development Guide
1. Local Development with DuckDB
For rapid iteration without AWS resources, use DuckDB as the destination:
Create a
.env.localfile with:Run the pipeline locally:
Examine results in the local .duckdb database
2. Local Development with S3
To run the lambda with S3 as a temporary destination:
Configure
.env.localwith:Run with the same command:
3. Testing on AWS
Once your code is deployed to AWS you can run the lambda with:
4. VSCode Debugging
For interactive debugging, add this to .vscode/launch.json:
Schema Management
Snowflake landing table schemas are defined in YAML files in the pipelines/ingest/<source-name>_source_schema.yml file.
After running the pipeline locally, generate a source schema definition:
This will generate a schema file chess_source_schema.yml in the pipelines folder to define the Snowflake tables.
Manual Deployment
For manual deployment:
This process:
Builds the Docker image locally
Pushes it to ECR
Updates the Lambda to use the new image
Common Commands
Resources
Last updated