Transformation: dbt

Overview

This directory contains a data transformation pipeline that:

Takes data from Iceberg tables in the landing zone
Transforms it using dbt (data build tool)
Creates analytics-ready tables in staging and mart schemas

The pipeline runs as an AWS ECS Fargate task using a Docker container.

How It Works

Infrastructure Components

AWS Athena: SQL query engine for data transformation
Amazon S3: Hosts the Iceberg tables for both source and transformed data
AWS Glue: Provides the catalog for Iceberg tables
Amazon ECS: Orchestrates the dbt container execution
Amazon ECR: Stores the dbt docker container image
Terraform: Provisions and manages all infrastructure

Project Structure

pipelines/
├── transform/                     # dbt project root
│   ├── Dockerfile
│   ├── dbt_project.yml            # dbt project configuration
│   ├── sources/
│   │   ├──<source_name>.yml       # List all landing tables for a source
│   ├── models/
│   │   ├── staging/               # Staging models (first transformation layer)
│   │   └── mart/                  # Final business-ready models
│   └── ...
└── ecs_task_dbt.tf                # Terraform creating the ECS task

Data Transformation Flow

The pipeline follows these transformation layers:

Sources: Raw data from landing tables created by ingestion pipelines
Staging: Initial cleaning, type conversion, deduplication and renaming
Mart: Final models organized by business domain, ready for analytics and reporting

Sources

Sources are defined in the sources/ folder and reference the landing tables created by the ingestion pipelines:

sources/<source_name>.yml

sources:
  - name: <source_name>
    schema: <landing_schema>
    tables:
      - name: <source_name>__dlt_version
      - name: <source_name>__dlt_loads
      ...

You can generate this file automatically using the BoringData CLI:

cd pipelines/transform
uvx boringdata dbt import-source --source ../ingest/<source_name>-schema/

Models Structure

The dbt models follow a layered architecture pattern:

Each folder in the models directory corresponds to a distinct schema in Athena
models/staging/ ➡️ <environment>_staging schema in Athena
models/mart/ ➡️ <environment>_mart schema in Athena

Development Guide

Option 1: Execute dbt Locally

For rapid development with local dbt execution:

Setup your environment:

uv venv --python=python3.12
uv pip install -r requirements.txt
uv run dbt deps

Configure dbt profile: Create or update ~/.dbt/profiles.yml with:

local:
  target: <environment>
  outputs:
    <environment>:
      type: athena
      database: awsdatacatalog
      region_name: "{{ env_var('AWS_REGION') }}"
      schema: "<environment>_staging"
      s3_staging_dir: "s3://<environment>-<region>-staging-bucket/athena"
      s3_data_dir: "s3://<environment>-<region>-staging-bucket/data"
      s3_tmp_table_dir: "s3://<environment>-<region>-staging-bucket/tmp"

Run dbt commands:

export DBT_PROFILE=local
export AWS_PROFILE=<your_profile>
export AWS_REGION=<your_region>

# Run a specific model
uv run dbt run --select model_name

# Run with Makefile shortcut
make run-local cmd="run --select model_name"

Option 2: Execute in AWS ECS Fargate

Once your template is deployed to AWS you can run dbt in the cloud environment:

export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
make run cmd="run"

This will trigger an ECS Fargate task to execute the specified dbt command and store results in Iceberg.

Deployment

For manual deployment:

# Set required environment variables
export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
cd pipelines/transform

# Build and deploy
make deploy

This process:

Builds the Docker image locally
Pushes it to ECR

The next time you trigger an ECS task, it will use the latest image.

Common Commands

# Development
make run-local cmd="run"              # Run dbt locally with specified command
make run-local cmd="test"             # Run dbt tests locally
make run-local cmd="docs generate"    # Generate dbt documentation

# Cloud Execution
make run cmd="run"                    # Run dbt in ECS Fargate
make run cmd="test"                   # Run tests in ECS Fargate

# Deployment
make build                            # Build Docker image
make deploy                           # Build and deploy to ECR

Resources

PreviousIngestion: dlt + lambda Nextbase/aws/

Last updated 5 months ago