Boring Data
Template: AWS+Snowflake
Template: AWS+Snowflake
  • Introduction
    • Overview
    • Key Concepts
    • Get Started
  • Project Structure
    • pipelines/
      • Ingestion: dlt + lambda
      • Transformation: dbt
    • base/aws/
    • base/snowflake/
    • live/
  • Guides
    • Add a New Pipeline
    • CI Deployment
  • Help
    • FAQ
Powered by GitBook
On this page
Edit on GitHub
  1. Project Structure
  2. pipelines/

Ingestion: dlt + lambda

Previouspipelines/NextTransformation: dbt

Last updated 5 months ago

CtrlK
  • Overview
  • How It Works
  • Infrastructure Components
  • Code Structure
  • Data Flow Process
  • Development Guide
  • 1. Local Development with DuckDB
  • 2. Local Development with S3
  • 3. Testing on AWS
  • 4. VSCode Debugging
  • Schema Management
  • Manual Deployment
  • Common Commands
  • Resources

Overview

This example demonstrates a serverless data ingestion pipeline that:

  1. Fetches chess data from an external source and processes it using dlt

  2. Writes the data to S3

The pipeline runs as an AWS Lambda function packaged in a Docker container.

How It Works

Infrastructure Components

  • AWS Lambda: Executes the ingestion code on demand

  • Amazon ECR: Stores the Docker container image

  • Amazon S3: Temporary storage for data files before loading to Snowflake

  • AWS Secrets Manager: Stores credentials and configuration

  • Terraform: Provisions and manages all infrastructure

Code Structure

pipelines/
├── chess_lambda.tf           # Terraform creating the lambda function and ECR repository
└── ingest/
    ├── chess_source_schema.yml   # Snowflake table schema definitions in YAML format
    └── chess-ingestion/          # Lambda function code
        ├── Dockerfile
        ├── lambda_handler.py # Lambda code with DLT pipeline
        └── ...

Data Flow Process

The pipeline follows these steps:

  1. Extraction: DLT extracts data from the source

  2. Transformation: DLT performs basic transformations (typing, normalization)

  3. Loading: DLT loads the data directly to S3

  4. Schema Management: Table schemas are defined in YAML files and managed by the pipeline

Development Guide

1. Local Development with DuckDB

For rapid iteration without AWS resources, use DuckDB as the destination:

  1. Create a .env.local file with:

    DESTINATION=duckdb
    # Add any source-specific credentials here
  2. Run the pipeline locally:

    make run-local
  3. Examine results in the local .duckdb database

2. Local Development with S3

To run the lambda with S3 as a temporary destination:

  1. Configure .env.local with:

    DESTINATION=filesystem
    AWS_REGION=<your-aws-region>
    S3_BUCKET_NAME=<your-s3-bucket-name>
    AWS_PROFILE=<your-aws-profile>
    # Add any source-specific credentials here
  2. Run with the same command:

    make run-local

3. Testing on AWS

Once your code is deployed to AWS you can run the lambda with:

export AWS_PROFILE=<your_profile>
make run-lambda env=<your_environment>

4. VSCode Debugging

For interactive debugging, add this to .vscode/launch.json:

{
    "name": "Debug chess lambda",
    "type": "debugpy",
    "request": "launch",
    "program": "${workspaceFolder}/pipelines/ingest/chess-ingestion/lambda_handler.py",
    "console": "integratedTerminal",
    "cwd": "${workspaceFolder}/pipelines/ingest/chess-ingestion",
    "justMyCode": false
}

Schema Management

Snowflake landing table schemas are defined in YAML files in the pipelines/ingest/<source-name>_source_schema.yml file.

After running the pipeline locally, generate a source schema definition:

cd pipelines/
uvx boringdata dlt get-schema chess

This will generate a schema file chess_source_schema.yml in the pipelines folder to define the Snowflake tables.

Manual Deployment

For manual deployment:

# Set required environment variables
export AWS_PROFILE=<your_profile>

# Build and deploy
make deploy env=<your_environment>

This process:

  1. Builds the Docker image locally

  2. Pushes it to ECR

  3. Updates the Lambda to use the new image

Common Commands

# Development
make run-local                        # Run locally with settings from .env.local
make run-lambda env=<environment>     # Execute on AWS Lambda

# Deployment
make build env=<environment>          # Build Docker image
make deploy env=<environment>         # Build and deploy to ECR

# Utilities
make help                             # Show all available commands

Resources

  • DLT Documentation

  • Snowflake SQL API Documentation