Transformation: dbt

Overview

This directory contains a data transformation pipeline that:

  1. Takes data from Iceberg tables in the landing zone

  2. Transforms it using dbt (data build tool)

  3. Creates analytics-ready tables in staging and mart schemas

The pipeline runs as an AWS ECS Fargate task using a Docker container.

How It Works

Infrastructure Components

  • AWS Athena: SQL query engine for data transformation

  • Amazon S3: Hosts the Iceberg tables for both source and transformed data

  • AWS Glue: Provides the catalog for Iceberg tables

  • Amazon ECS: Orchestrates the dbt container execution

  • Amazon ECR: Stores the dbt docker container image

  • Terraform: Provisions and manages all infrastructure

Project Structure

Data Transformation Flow

The pipeline follows these transformation layers:

  1. Sources: Raw data from landing tables created by ingestion pipelines

  2. Staging: Initial cleaning, type conversion, deduplication and renaming

  3. Mart: Final models organized by business domain, ready for analytics and reporting

Sources

Sources are defined in the sources/ folder and reference the landing tables created by the ingestion pipelines:

You can generate this file automatically using the BoringData CLI:

Models Structure

The dbt models follow a layered architecture pattern:

  • Each folder in the models directory corresponds to a distinct schema in Athena

  • models/staging/ ➡️ <environment>_staging schema in Athena

  • models/mart/ ➡️ <environment>_mart schema in Athena

Development Guide

Option 1: Execute dbt Locally

For rapid development with local dbt execution:

  1. Setup your environment:

  2. Configure dbt profile: Create or update ~/.dbt/profiles.yml with:

  3. Run dbt commands:

Option 2: Execute in AWS ECS Fargate

Once your template is deployed to AWS you can run dbt in the cloud environment:

This will trigger an ECS Fargate task to execute the specified dbt command and store results in Iceberg.

Deployment

For manual deployment:

This process:

  1. Builds the Docker image locally

  2. Pushes it to ECR

The next time you trigger an ECS task, it will use the latest image.

Common Commands

Resources

Last updated