Add a New Pipeline
Last updated
Last updated
This guide explains how to add a new data pipeline to the template.
The pipeline architecture includes:
Data ingestion using serverless functions (AWS Lambda) and an ELT tool ()
Staging in cloud object storage ()
Automated data loading into landing tables
Data transformation using SQL analytics with
The boringdata CLI automates many steps along the way.
Before you start, make sure you have installed the boringdata CLI:
You can then use the boringdata CLI from any directory:
Let's start by adding a new data source for ingestion.
The template uses as the ingestion framework. Check the to find the connector you want.
You can then generate a full ingestion pipeline for this connector by running:
This command will create the following files:
pipelines/<source_name>_lambda.tf
= serverless function (AWS lambda) infrastructure
pipelines/ingest/<source_name>-ingestion/*
= Lambda's dockerized code
Boringdata will also run some helpful operations:
Set up a Python virtual environment and install the necessary dependencies
Copy .env.example
to .env.local
Parse required secrets from configuration files and update both environment variables and infrastructure configurations
If your source requires secrets (for example, an API key), update the .env.example.
Example for Notion integration:
The following lines should be present in the .env file:
Edit pipelines/ingest/<source_name>-ingestion/lambda_handler.py
Example for Notion integration:
This step allows you to test the function and inspect the output data format.
Based on the YAML file generated in step 5, boringdata can automatically generate corresponding SQL transformation models for each of the tables:
Finally, deploy the project:
Initialize the data connector
Example using the as a source:
After deployment, update these secrets manually in if needed.
Use the <connector_name>_pipeline.py generated by as an inspiration
To verify your changes, run the function locally (using as a local target):
Generate a YAML file that defines your source's data structure (used to create data warehouse tables in ):
To coordinate the ingestion and transformation steps, add workflow automation using :