Ingestion: dlt + lambda
Overview
This example demonstrates a serverless data ingestion pipeline that:
Fetches chess data from an external source and processes it using dlt
Writes the data to S3
The pipeline runs as an AWS Lambda function packaged in a Docker container.

How It Works
Infrastructure Components
AWS Lambda: Executes the ingestion code on demand
Amazon ECR: Stores the Docker container image
Amazon S3: Temporary storage for data files before loading to Snowflake
AWS Secrets Manager: Stores credentials and configuration
Terraform: Provisions and manages all infrastructure
Code Structure
pipelines/
├── chess_lambda.tf # Terraform creating the lambda function and ECR repository
└── ingest/
├── chess_source_schema.yml # Snowflake table schema definitions in YAML format
└── chess-ingestion/ # Lambda function code
├── Dockerfile
├── lambda_handler.py # Lambda code with DLT pipeline
└── ...
Data Flow Process
The pipeline follows these steps:
Extraction: DLT extracts data from the source
Transformation: DLT performs basic transformations (typing, normalization)
Loading: DLT loads the data directly to S3
Schema Management: Table schemas are defined in YAML files and managed by the pipeline
Development Guide
1. Local Development with DuckDB
For rapid iteration without AWS resources, use DuckDB as the destination:
Create a
.env.local
file with:DESTINATION=duckdb # Add any source-specific credentials here
Run the pipeline locally:
make run-local
Examine results in the local .duckdb database
2. Local Development with S3
To run the lambda with S3 as a temporary destination:
Configure
.env.local
with:DESTINATION=filesystem AWS_REGION=<your-aws-region> S3_BUCKET_NAME=<your-s3-bucket-name> AWS_PROFILE=<your-aws-profile> # Add any source-specific credentials here
Run with the same command:
make run-local
3. Testing on AWS
Once your code is deployed to AWS you can run the lambda with:
export AWS_PROFILE=<your_profile>
make run-lambda env=<your_environment>
4. VSCode Debugging
For interactive debugging, add this to .vscode/launch.json
:
{
"name": "Debug chess lambda",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/pipelines/ingest/chess-ingestion/lambda_handler.py",
"console": "integratedTerminal",
"cwd": "${workspaceFolder}/pipelines/ingest/chess-ingestion",
"justMyCode": false
}
Schema Management
Snowflake landing table schemas are defined in YAML files in the pipelines/ingest/<source-name>_source_schema.yml
file.
After running the pipeline locally, generate a source schema definition:
cd pipelines/
uvx boringdata dlt get-schema chess
This will generate a schema file chess_source_schema.yml
in the pipelines folder to define the Snowflake tables.
Manual Deployment
For manual deployment:
# Set required environment variables
export AWS_PROFILE=<your_profile>
# Build and deploy
make deploy env=<your_environment>
This process:
Builds the Docker image locally
Pushes it to ECR
Updates the Lambda to use the new image
Common Commands
# Development
make run-local # Run locally with settings from .env.local
make run-lambda env=<environment> # Execute on AWS Lambda
# Deployment
make build env=<environment> # Build Docker image
make deploy env=<environment> # Build and deploy to ECR
# Utilities
make help # Show all available commands
Resources
Last updated