# Add a New Pipeline

This guide explains how to add a new data pipeline to the template.

The pipeline architecture includes:

1. Data ingestion using serverless functions (AWS Lambda) and an ELT tool ([dlt](https://dlthub.com/docs/intro))
2. Staging in cloud object storage ([Amazon S3](https://aws.amazon.com/s3/))
3. Automated data loading into [Snowflake](https://www.snowflake.com/en/) landing tables
4. Data transformation using SQL analytics with [dbt](https://docs.getdbt.com/)

The boringdata CLI automates many steps along the way.

Before you start, make sure you have installed the boringdata CLI:

{% tabs %}
{% tab title="SSH GitHub auth" %}
{% code overflow="wrap" %}

```bash
uv tool install git+ssh://git@github.com/boringdata/boringdata-cli.git --python 3.12
```

{% endcode %}
{% endtab %}

{% tab title="HTTPS GitHub auth" %}
{% code overflow="wrap" %}

```bash
uv tool install https://github.com/boringdata/boringdata-cli.git --python 3.12
```

{% endcode %}
{% endtab %}
{% endtabs %}

You can then use the boringdata CLI from any directory:

<pre class="language-bash"><code class="lang-bash"><strong>uvx boringdata --help
</strong></code></pre>

## Step 1: Add a New Data Source

Let's start by adding a new data source for ingestion.

The template uses [dlt](https://dlthub.com/docs/intro) as the ingestion framework. Check the [dlt ecosystem](https://dlthub.com/docs/dlt-ecosystem/verified-sources/) to find the connector you want.

You can then generate a full ingestion pipeline for this connector by running:

```bash
cd pipelines && uvx boringdata dlt add-source <connector_name>
```

This command will create the following files:

`pipelines/<source_name>_lambda.tf` = serverless function (AWS lambda) infrastructure

`pipelines/ingest/<source_name>-ingestion/*` = Lambda's dockerized code

Boringdata will also run some helpful operations:

* Set up a Python virtual environment and install the necessary dependencies
* Copy `.env.example` to `.env.local`
* Initialize the [dlt](https://dlthub.com/docs/intro) data connector
* Parse required secrets from configuration files and update both environment variables and infrastructure configurations

Example using the [Notion API](https://developers.notion.com/) as a source:

```
cd pipelines && uvx boringdata dlt add-source notion
```

{% hint style="info" %}
You can assign a different name to your source than the connector name.

To do so, add the CLI option: --source-name \<source\_name>
{% endhint %}

## Step 2: Configure Secrets

If your source requires secrets (for example, an API key), update the <kbd>.env.example</kbd>.

After deployment, update these secrets manually in [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/) if needed.

Example for Notion integration:

The following lines should be present in the .env file:

```bash
SOURCES__NOTION__API_KEY="your_api_key_here"
```

## Step 3: Customize the Ingestion Logic

Edit `pipelines/ingest/<source_name>-ingestion/lambda_handler.py`

{% code title="pipelines/ingest/\<source\_name>-ingestion/lambda\_handler.py" %}

```python
#Add missing imports
from <source_name> import <source_functions>
...

#Update the scope of data to be loaded
load_data =
```

{% endcode %}

Example for Notion integration:

```python
from notion import notion_databases
...

#Update the scope of data to be loaded
load_data = notion_databases(database_ids=["your_database_id"])
```

{% hint style="info" %}
Use the <kbd>\<connector\_name>\_pipeline.py</kbd> generated by [dlt](https://dlthub.com/docs/intro) as an inspiration
{% endhint %}

## Step 4: Test the Ingestion Function Locally

To verify your changes, run the function locally (using [DuckDB](https://duckdb.org/) as a local target):

```bash
cd pipelines/ingest/<source_name>-ingestion/ && make run-local
```

This step allows you to test the function and inspect the output data format.

## Step 5: Generate the Source Schema

{% hint style="info" %}
[Why do you need to generate a yaml for each source ?](https://docs.boringdata.io/template-aws-snowflake/guides/pages/mvzXlWASNIL69qnoS81s#what-are-less-than-source-greater-than-source_schema.yml-files)
{% endhint %}

Generate a YAML file that defines your source's data structure (used to create data warehouse tables in [Snowflake](https://www.snowflake.com/en/)):

```bash
uvx boringdata dlt get-schema <source_name> \
    --engine snowflake \
    --output-folder pipelines/ingest/
```

## Step 6: Create Transformation Models

Based on the YAML file generated in step 5, boringdata can automatically generate corresponding SQL transformation models for each of the tables:

```bash
uvx boringdata dbt import-source \
    --source-yml pipelines/ingest/<source_name>_source_schema.yml \
    --output-folder pipelines/transform
```

## Step 7: (Optional) Add Workflow Automation

To coordinate the ingestion and transformation steps, add workflow automation using [AWS Step Functions](https://aws.amazon.com/step-functions/):

```bash
uvx boringdata aws step-function lambda-dbt \
    --output-folder pipelines \
    --source-name <source_name>
```

## Step 8: Deploy the Infrastructure

Finally, deploy the project:

```bash
export AWS_PROFILE=your_aws_profile
export SNOWFLAKE_PROFILE=your_snowflake_profile
export ENVIRONMENT=dev
make deploy
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.boringdata.io/template-aws-snowflake/guides/add-a-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
