Key Concepts
Understand template's structure
Last updated
Understand template's structure
Last updated
This section explains the core concepts and architecture of this template.
The template's code is organized into three main components:
Each component is documented separately here:
Serverless function ingest data to S3
Snowpipes copy data from S3 into tables in Snowflake (landing tables)
Our data platform follows a layered architecture:
For each source, the ingestion layer is structured as follows:
Each source has:
A folder pipelines/ingest/<source>-ingestion/
containing the core ingestion logic packaged in a container
A YAML file pipelines/<source>_source_schema.yml
for the management of the data warehouse tables
More details on how this transformation project is structured here:
The orchestration layer coordinates the execution of the ingestion and transformation layers using workflow automation.
This template is ready to be deployed.
The stack deployment is structured in 2 steps:
If you want to get started quickly and deploy the template from your machine, follow this guide:
The template is composed of many Makefiles providing utilities.
Here are some examples:
make deploy
in the root folder will deploy the template from your machine
make build
in a folder with a Dockerfile will build the container
make local-run
will run the code locally
etc.
Everywhere you see a Makefile, run make
and the list of possible actions will be listed.
Data transformations are applied to create staging and mart tables using SQL transformations in
Infrastructure as Code files in pipelines/*tf
for deploying this ingestion container (as serverless functions (AWS Lambda) or container tasks ())
Schema management is handled through YAML files, making it easy to define and evolve table structures. More info in
The template comes with an example data ingestion pipeline deployed as a serverless function using ; more details here:
The transformation layer is a SQL-based project that transforms the data into analytics-ready tables using :
This project is located in the pipelines/transform
folder and uses as the transformation framework:
This transformation project runs on container infrastructure ( Fargate) and connects directly to .
This template proposes an example orchestration using :
First, the infrastructure modules (base/ and pipelines/) are deployed using for infrastructure management
Then, the containers for the ingestion and transformation layers are built and pushed to the container registry ()
To get started deploying from CI/CD, head there: