Transformation: dbt
Last updated
Last updated
This directory contains a data transformation pipeline that:
Takes data from Iceberg tables in the landing zone
Transforms it using dbt (data build tool)
Creates analytics-ready tables in staging and mart schemas
The pipeline runs as an AWS ECS Fargate task using a Docker container.
AWS Athena: SQL query engine for data transformation
Amazon S3: Hosts the Iceberg tables for both source and transformed data
AWS Glue: Provides the catalog for Iceberg tables
Amazon ECS: Orchestrates the dbt container execution
Amazon ECR: Stores the dbt docker container image
Terraform: Provisions and manages all infrastructure
pipelines/
├── transform/ # dbt project root
│ ├── Dockerfile
│ ├── dbt_project.yml # dbt project configuration
│ ├── sources/
│ │ ├──<source_name>.yml # List all landing tables for a source
│ ├── models/
│ │ ├── staging/ # Staging models (first transformation layer)
│ │ └── mart/ # Final business-ready models
│ └── ...
└── ecs_task_dbt.tf # Terraform creating the ECS task
The pipeline follows these transformation layers:
Sources: Raw data from landing tables created by ingestion pipelines
Staging: Initial cleaning, type conversion, deduplication and renaming
Mart: Final models organized by business domain, ready for analytics and reporting
Sources are defined in the sources/
folder and reference the landing tables created by the ingestion pipelines:
sources:
- name: <source_name>
schema: <landing_schema>
tables:
- name: <source_name>__dlt_version
- name: <source_name>__dlt_loads
...
You can generate this file automatically using the BoringData CLI:
cd pipelines/transform
uvx boringdata dbt import-source --source ../ingest/<source_name>-schema/
The dbt models follow a layered architecture pattern:
Each folder in the models
directory corresponds to a distinct schema in Athena
models/staging/
➡️ <environment>_staging
schema in Athena
models/mart/
➡️ <environment>_mart
schema in Athena
For rapid development with local dbt execution:
Setup your environment:
uv venv --python=python3.12
uv pip install -r requirements.txt
uv run dbt deps
Configure dbt profile:
Create or update ~/.dbt/profiles.yml
with:
local:
target: <environment>
outputs:
<environment>:
type: athena
database: awsdatacatalog
region_name: "{{ env_var('AWS_REGION') }}"
schema: "<environment>_staging"
s3_staging_dir: "s3://<environment>-<region>-staging-bucket/athena"
s3_data_dir: "s3://<environment>-<region>-staging-bucket/data"
s3_tmp_table_dir: "s3://<environment>-<region>-staging-bucket/tmp"
Run dbt commands:
export DBT_PROFILE=local
export AWS_PROFILE=<your_profile>
export AWS_REGION=<your_region>
# Run a specific model
uv run dbt run --select model_name
# Run with Makefile shortcut
make run-local cmd="run --select model_name"
Once your template is deployed to AWS you can run dbt in the cloud environment:
export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
make run cmd="run"
This will trigger an ECS Fargate task to execute the specified dbt command and store results in Iceberg.
For manual deployment:
# Set required environment variables
export AWS_PROFILE=<your_profile>
export ENVIRONMENT=<your_environment>
cd pipelines/transform
# Build and deploy
make deploy
This process:
Builds the Docker image locally
Pushes it to ECR
The next time you trigger an ECS task, it will use the latest image.
# Development
make run-local cmd="run" # Run dbt locally with specified command
make run-local cmd="test" # Run dbt tests locally
make run-local cmd="docs generate" # Generate dbt documentation
# Cloud Execution
make run cmd="run" # Run dbt in ECS Fargate
make run cmd="test" # Run tests in ECS Fargate
# Deployment
make build # Build Docker image
make deploy # Build and deploy to ECR