# Transformation: dbt

## Overview

This directory contains a data transformation pipeline that:

1. Takes data from Snowflake landing tables
2. Transforms it using [dbt](https://www.getdbt.com/) (data build tool)
3. Creates analytics-ready tables in staging and mart schemas

The pipeline runs as an AWS ECS Fargate task using a Docker container.

## How It Works

### Infrastructure Components

* **Snowflake**: Data warehouse for both source and transformed data
* **Amazon ECS**: Orchestrates the dbt container execution
* **Amazon ECR**: Stores the dbt docker container image
* **Terraform**: Provisions and manages all infrastructure

### Project Structure

```
pipelines/
├── transform/                     # dbt project root
│   ├── Dockerfile
│   ├── dbt_project.yml            # dbt project configuration
│   ├── sources/
│   │   ├──<source_name>.yml       # List all landing tables for a source
│   ├── models/
│   │   ├── staging/               # Staging models (first transformation layer)
│   │   └── mart/                  # Final business-ready models
│   └── ...
└── ecs_task_dbt.tf                # Terraform creating the ECS task
```

### Data Transformation Flow

The pipeline follows these transformation layers:

1. **Sources**: Raw data from landing tables created by ingestion pipelines
2. **Staging**: Initial cleaning, type conversion, deduplication and renaming
3. **Mart**: Final models organized by business domain, ready for analytics and reporting

## Sources

Sources are defined in the `models/sources/` folder and reference the landing tables created by the ingestion pipelines:

{% code title="models/sources/\<source\_name>.yml" %}

```yaml
sources:
  - name: <source_name>
    schema: <landing_schema>
    tables:
      - name: <source_name>__dlt_version
      - name: <source_name>__dlt_loads
      ...
```

{% endcode %}

You can generate this file automatically using the BoringData CLI:

```bash
cd pipelines/transform
uvx boringdata dbt import-sources --source ../ingest/<source_name>_source_schema.yml
```

## Models Structure

The dbt models follow a layered architecture pattern:

* Each folder in the `models` directory corresponds to a distinct schema in Snowflake
* `models/staging/` ➡️ `STAGING` schema in Snowflake
* `models/mart/` ➡️ `MART` schema in Snowflake

This behavior is configured in the `macros/schema_name.yml` and `dbt_project.yml` files.

## Development Guide

### Option 1: Execute dbt Locally

For rapid development with local dbt execution:

1. **Setup your environment**:

   ```bash
   uv venv --python=python3.12
   uv pip install -r requirements.txt
   uv run dbt deps
   ```
2. **Configure dbt profile**:\
   Create or update `~/.dbt/profiles.yml` with:

   ```yaml
   local:
     target: <environment>
     outputs:
       <environment>:
         type: snowflake
         account: <organization>-<account_name> # stored in AWS Parameter Store /<environment>/SNOWFLAKE/HOST
         user:
         password:
         role: <environment>_TECH_USER_ROLE # Upper case!
         database: <environment>_DB         # Upper case!
         schema: LANDING
         warehouse: <environment>_WH        # Upper case!
   ```
3. **Run dbt commands**:

   ```bash
   export DBT_PROFILE=local

   # Run a specific model
   uv run dbt run --select model_name

   # Run with Makefile shortcut
   make run-local cmd="run --select model_name"
   ```

### Option 2: Execute in AWS ECS Fargate

Once your template is deployed to AWS you can run dbt in the cloud environment:

```bash
export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
make run cmd="run"
```

This will trigger an ECS Fargate task to execute the specified dbt command and store results in Snowflake.

## Deployment

For manual deployment:

```bash
# Set required environment variables
export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
cd pipelines/transform

# Build and deploy
make deploy
```

This process:

1. Builds the Docker image locally
2. Pushes it to ECR

The next time you trigger an ECS task, it will use the latest image.

## Common Commands

```bash
# Development
make run-local cmd="run"              # Run dbt locally with specified command
make run-local cmd="test"             # Run dbt tests locally
make run-local cmd="docs generate"    # Generate dbt documentation

# Cloud Execution
make run cmd="run"                    # Run dbt in ECS Fargate
make run cmd="test"                   # Run tests in ECS Fargate

# Deployment
make build                            # Build Docker image
make deploy                           # Build and deploy to ECR
```

## Resources

* [dbt Documentation](https://docs.getdbt.com/)
* [Snowflake User Guide](https://docs.snowflake.com/)
* [BoringData CLI Guide](https://docs.boringdata.io/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.boringdata.io/template-aws-snowflake/project-structure/pipelines/transform.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
