Introduction list-tree Key ConceptsUnderstand template's structure
This section explains the core concepts and architecture of this template.
The template's code is organized into three main components:
Copy 📁
├── 📁 pipelines/ # Data pipelines:
│ ├── 📁 ingest/ # Data ingestion layer
│ ├── 📁 transform/ # Data transformation layer
│ └── 📁 orchestrate/ # Workflow orchestration layer
│
├── 📁 base/ # Cloud infrastructure (VPC, roles, users, compute cluster, etc.)
│
└── 📁 live/ # Environment-specific deployment configuration Each component is documented separately here:
folder pipelines/ chevron-right folder base/aws/ chevron-right folder live/ chevron-right Data Pipeline Architecture
Our data platform follows a layered architecture:
1. Data Ingestion Layer
For each source, the ingestion layer is structured as follows:
Each source has:
A folder pipelines/ingest/<source>-ingestion/ containing the core ingestion logic packaged in a container
Infrastructure as Code files in pipelines/*tf for deploying this ingestion container (as serverless functions (AWS lambda) or container tasks (Amazon ECSarrow-up-right ))
A folder for the management of the landing tables (<source>-schema/)
The template comes with an example data ingestion pipeline deployed as a serverless function (lambda) using dltarrow-up-right ; more details here:
folder Ingestion: dlt + lambda chevron-right The transformation layer is a dbtarrow-up-right project that transforms the data into Iceberg staging tables using the SQL query engine Amazon Athenaarrow-up-right .
This project is located in the pipelines/transform folder:
This transformation project runs on container infrastructure (Amazon ECSarrow-up-right Fargate).
More details on how this transformation project is structured here:
folder Transformation: dbt chevron-right 3. Workflow Orchestration Layer
The orchestration layer coordinates the execution of the ingestion and transformation layers using workflow automation.
This template proposes an example orchestration using AWS Step Functionsarrow-up-right :
This template is ready to be deployed.
The stack deployment is structured in 3 steps:
First, the infrastructure modules (base/ and pipelines/) are deployed using Terragruntarrow-up-right for infrastructure management
Then, the containers for the ingestion and transformation layers are built and pushed to the container registry
Finally, the schema evolution scripts of the Iceberg landing tables are run
If you want to get started quickly and deploy the template from your machine, follow this guide:
circle-play Get Started chevron-right To get started deploying from GitHub Actionsarrow-up-right CI/CD, head there:
rocket-launch CI Deployment chevron-right The template is composed of many Makefiles providing utilities.
Here are some examples:
make deploy in the root folder will deploy the template from your machine
make build in a folder with a Dockerfile will build the container
make local-run in a serverless function folder will test the function locally
Everywhere you see a Makefile, run make and the list of possible actions will be listed
circle-play Get Started chevron-right Last updated 11 months ago