Key Concepts
Understand template's structure
This section explains the core concepts and architecture of this template.
Code Structure
The template's code is organized into three main components:
📁
├── 📁 pipelines/ # Data pipelines:
│ ├── 📁 ingest/ # Data ingestion layer
│ ├── 📁 transform/ # Data transformation layer
│ └── 📁 orchestrate/ # Workflow orchestration layer
│
├── 📁 base/ # Cloud infrastructure
│ ├── 📁 aws/ # Cloud provider resources (VPC, IAM, etc.)
│ └── 📁 snowflake/ # Data warehouse resources
│
└── 📁 live/ # Environment-specific deployment configuration
Each component is documented separately here:
pipelines/base/aws/base/snowflake/live/Data Flow
Serverless function ingest data to S3
Snowpipes copy data from S3 into tables in Snowflake (landing tables)
Data transformations are applied to create staging and mart tables using SQL transformations in dbt
Data Pipeline Architecture
Our data platform follows a layered architecture:
1. Data Ingestion Layer
For each source, the ingestion layer is structured as follows:
📁 pipelines/
├── 📁 ingest/
│ ├── 📁 <source>-ingestion/ # Core ingestion logic
│ │
│ └── <source>_source_schema.yml # Table schema definitions (YAML)
│
└── <source>_*.tf # Infrastructure definition (serverless functions, containers, etc.)
Each source has:
A folder
pipelines/ingest/<source>-ingestion/
containing the core ingestion logic packaged in a containerInfrastructure as Code files in
pipelines/*tf
for deploying this ingestion container (as serverless functions (AWS Lambda) or container tasks (Amazon ECS))A YAML file
pipelines/<source>_source_schema.yml
for the management of the data warehouse tables
The template comes with an example data ingestion pipeline deployed as a serverless function using dlt; more details here:
Ingestion: dlt + lambda2. Data Transformation Layer
The transformation layer is a SQL-based project that transforms the data into analytics-ready tables using dbt:
This project is located in the pipelines/transform
folder and uses dbt as the transformation framework:
📁 pipelines/
├── 📁 transform/ # SQL transformation project
│ ├── 📁 models/
│ │ ├── 📁 staging/ # Raw table connections
│ │ └── 📁 marts/ # Transformations
│ │
│ ├── dbt_project.yml
│ └── Dockerfile # For container deployment
│
└── ecs_task_dbt.tf # Infrastructure for transformation tasks
This transformation project runs on container infrastructure (Amazon ECS Fargate) and connects directly to Snowflake.
More details on how this transformation project is structured here:
Transformation: dbt3. Workflow Orchestration Layer
The orchestration layer coordinates the execution of the ingestion and transformation layers using workflow automation.
This template proposes an example orchestration using AWS Step Functions:
📁 pipelines/
├── 📁 orchestrate/
│ └── <source>_step_function.json # Workflow definition
│
└── <source>_step_function.tf # Creates an orchestration workflow in [AWS Step Functions](https://aws.amazon.com/step-functions/)

Deployment
This template is ready to be deployed.
The stack deployment is structured in 2 steps:
First, the infrastructure modules (base/ and pipelines/) are deployed using Terragrunt for infrastructure management
Then, the containers for the ingestion and transformation layers are built and pushed to the container registry (Amazon ECR)

If you want to get started quickly and deploy the template from your machine, follow this guide:
Get StartedTo get started deploying from GitHub Actions CI/CD, head there:
CI DeploymentMakefile
The template is composed of many Makefiles providing utilities.
Here are some examples:
make deploy
in the root folder will deploy the template from your machinemake build
in a folder with a Dockerfile will build the containermake local-run
will run the code locallyetc.
Everywhere you see a Makefile, run make
and the list of possible actions will be listed.
Last updated