Boring Data
Template: AWS+Iceberg
Template: AWS+Iceberg
  • Introduction
    • Overview
    • Key Concepts
    • Get Started
  • Project Structure
    • pipelines/
      • Ingestion: dlt + lambda
      • Transformation: dbt
    • base/aws/
    • live/
  • Guides
    • Add a New Pipeline
    • CI Deployment
  • Help
    • FAQ
Powered by GitBook
On this page
  • Step 1: Add a New Data Source
  • Step 2: Configure Secrets
  • Step 3: Customize the Ingestion Logic
  • Step 4: Test the Ingestion Function Locally
  • Step 5: Generate the Source Schema
  • Step 6: Create Transformation Models
  • Step 7: (Optional) Add Workflow Automation
  • Step 8: Deploy the Infrastructure
Edit on GitHub
  1. Guides

Add a New Pipeline

Previouslive/NextCI Deployment

Last updated 2 months ago

This guide explains how to add a new data pipeline to the template.

The pipeline architecture includes:

  1. Data ingestion using serverless functions (AWS Lambda) and an ELT tool (dlt)

  2. Data lake storage in cloud object storage (AWS S3)

  3. Data transformation using an SQL transformation engine () and dbt.

The boringdata CLI automates many steps along the way.

Before you start, make sure you have installed the boringdata CLI:

uv tool install git+ssh://[email protected]/boringdata/boringdata-cli.git --python 3.12
uv tool install https://github.com/boringdata/boringdata-cli.git --python 3.12

You can then use the boringdata CLI from any directory:

uvx boringdata --help

Step 1: Add a New Data Source

Let's start by adding a new data source for ingestion.

The template uses as the ingestion framework. Check the to find the connector you want.

You can then generate a full ingestion pipeline for this connector by running:

cd pipelines && uvx boringdata dlt add-source <connector_name> --destination iceberg

This command will create the following files:

pipelines/<source_name>-lambda.tf = serverless function infrastructure

pipelines/ingest/<source_name>-ingestion/* = ingestion code embedded in a serverless function

Boringdata will also run some helpful operations:

  • Set up a Python virtual environment and install necessary dependencies

  • Copy .env.example to .env.local

  • Initialize the data connector

  • Parse required secrets from configuration files and update both environment variables and infrastructure configurations

cd pipelines && uvx boringdata dlt add-source notion --destination iceberg

You can assign a different name to your source than the connector name.

To do so, add the CLI option: --source-name <source_name>

Step 2: Configure Secrets

If your source requires secrets (for example, an API key), update the .env.example.

Example for Notion integration:

The following lines should be present in the .env file:

SOURCES__NOTION__API_KEY="your_api_key_here"

Step 3: Customize the Ingestion Logic

Edit pipelines/ingest/<source_name>-ingestion/lambda_handler.py

pipelines/ingest/<source_name>-lambda/lambda_handler.py
#Add missing imports
from <source_name> import <source_functions>
...

#Update the scope of data to be loaded
load_data =

Example for Notion integration:

from notion import notion_databases
...

#Update the scope of data to be loaded
load_data = notion_databases(database_ids=["your_database_id"])

Use the <connector_name>_pipeline.py generated by the framework as inspiration

Step 4: Test the Ingestion Function Locally

cd pipelines/ingest/<source_name>-ingestion/ && make run-local

This step allows you to test the function and inspect the output data format.

Step 5: Generate the Source Schema

After running the pipeline locally (see above), generate a source schema definition:

cd pipelines/
uvx boringdata dlt get-schema <source_name> \
    --engine iceberg \
    --output-folder ingest

Step 6: Create Transformation Models

cd pipelines/transform
uvx boringdata dbt import-source \
    --source-yml ../ingest/<source_name>-schema/

Step 7: (Optional) Add Workflow Automation

cd pipelines
uvx boringdata aws step-function lambda-dbt \
    --source-name <source_name>

Step 8: Deploy the Infrastructure

Finally, deploy the project from the root directory:

export AWS_PROFILE=your_aws_profile
export ENVIRONMENT=dev
make deploy

Example using the as a source:

After deployment, update these secrets manually in if needed.

To verify your changes, run the function locally (using as a local target):

Based on the schema files generated in step 5, boringdata can automatically generate corresponding SQL transformation models for each of the tables using :

To coordinate the ingestion and transformation steps, add workflow automation using :

Amazon Athena
dlt
dlt ecosystem
Notion API
AWS Secrets Manager
DuckDB
Amazon Athena
AWS Step Functions