githubEdit

list-treeKey Concepts

Understand template's structure

This section explains the core concepts and architecture of this template.

Code Structure

The template's code is organized into three main components:

📁
├── 📁 pipelines/             # Data pipelines:
│   ├── 📁 ingest/                      # Data ingestion layer
│   ├── 📁 transform/                   # Data transformation layer
│   └── 📁 orchestrate/                 # Workflow orchestration layer

├── 📁 base/                  # Cloud infrastructure
│   ├── 📁 aws/                         # Cloud provider resources (VPC, IAM, etc.)
│   └── 📁 snowflake/                   # Data warehouse resources

└── 📁 live/                  # Environment-specific deployment configuration

Each component is documented separately here:

folderpipelines/chevron-rightfolderbase/aws/chevron-rightfolderbase/snowflake/chevron-rightfolderlive/chevron-right

Data Flow

  1. Serverless function ingest data to S3

  2. Snowpipes copy data from S3 into tables in Snowflake (landing tables)

  3. Data transformations are applied to create staging and mart tables using SQL transformations in dbtarrow-up-right

Data Pipeline Architecture

Our data platform follows a layered architecture:

1. Data Ingestion Layer

For each source, the ingestion layer is structured as follows:

Each source has:

  • A folder pipelines/ingest/<source>-ingestion/ containing the core ingestion logic packaged in a container

  • Infrastructure as Code files in pipelines/*tf for deploying this ingestion container (as serverless functions (AWS Lambda) or container tasks (Amazon ECSarrow-up-right))

  • A YAML file pipelines/<source>_source_schema.yml for the management of the data warehouse tables

circle-info

Schema management is handled through YAML files, making it easy to define and evolve table structures. More info in FAQ

The template comes with an example data ingestion pipeline deployed as a serverless function using dltarrow-up-right; more details here:

folderIngestion: dlt + lambdachevron-right

2. Data Transformation Layer

The transformation layer is a SQL-based project that transforms the data into analytics-ready tables using dbtarrow-up-right:

This project is located in the pipelines/transform folder and uses dbtarrow-up-right as the transformation framework:

This transformation project runs on container infrastructure (Amazon ECSarrow-up-right Fargate) and connects directly to Snowflakearrow-up-right.

More details on how this transformation project is structured here:

folderTransformation: dbtchevron-right

3. Workflow Orchestration Layer

The orchestration layer coordinates the execution of the ingestion and transformation layers using workflow automation.

This template proposes an example orchestration using AWS Step Functionsarrow-up-right:

Chess Pipeline Workflow in AWS Step Function

Deployment

This template is ready to be deployed.

The stack deployment is structured in 2 steps:

  • First, the infrastructure modules (base/ and pipelines/) are deployed using Terragruntarrow-up-right for infrastructure management

  • Then, the containers for the ingestion and transformation layers are built and pushed to the container registry (Amazon ECRarrow-up-right)

If you want to get started quickly and deploy the template from your machine, follow this guide:

circle-playGet Startedchevron-right

To get started deploying from GitHub Actionsarrow-up-right CI/CD, head there:

rocket-launchCI Deploymentchevron-right

Makefile

The template is composed of many Makefiles providing utilities.

Here are some examples:

  • make deploy in the root folder will deploy the template from your machine

  • make build in a folder with a Dockerfile will build the container

  • make local-run will run the code locally

  • etc.

Everywhere you see a Makefile, run make and the list of possible actions will be listed.

circle-playGet Startedchevron-right

Last updated