# Key Concepts

This section explains the core concepts and architecture of this template.

## Code Structure

The template's code is organized into three main components:

```
📁
├── 📁 pipelines/             # Data pipelines:
│   ├── 📁 ingest/                      # Data ingestion layer
│   ├── 📁 transform/                   # Data transformation layer
│   └── 📁 orchestrate/                 # Workflow orchestration layer
│
├── 📁 base/                  # Cloud infrastructure (VPC, roles, users, compute cluster, etc.)
│
└── 📁 live/                  # Environment-specific deployment configuration
```

Each component is documented separately here:

{% content-ref url="/pages/G2khJSaacMix8oYa7iVH" %}
[pipelines/](/template-aws-iceberg/project-structure/pipelines.md)
{% endcontent-ref %}

{% content-ref url="/pages/qo9vW9N1VXDI9z43Fdr5" %}
[base/aws/](/template-aws-iceberg/project-structure/aws.md)
{% endcontent-ref %}

{% content-ref url="/pages/txxF4gycrydceZI6n15j" %}
[live/](/template-aws-iceberg/project-structure/live.md)
{% endcontent-ref %}

## Data Flow

1. Source data is ingested into [Apache Iceberg](https://iceberg.apache.org/) landing tables: code in `pipelines/ingest/<source_name>-*/`
2. Data transformations are applied to create staging tables using SQL engine ([Amazon Athena](https://aws.amazon.com/athena/)): code in `pipelines/transform/`

## Data Pipeline Architecture

Our data platform follows a layered architecture:

### 1. Data Ingestion Layer

For each source, the ingestion layer is structured as follows:

```
📁 pipelines/
├── 📁 ingest/
│   ├── 📁 <source>-ingestion/      # Core ingestion logic
│   │
│   └── 📁 <source>-schema/         # Iceberg Table schema definitions
│       └── <table_name>.py
│       └── ...
│
└── <source>_*.tf                   # Infrastructure definition (serverless functions, containers, etc.)
```

Each source has:

* A folder `pipelines/ingest/<source>-ingestion/` containing the core ingestion logic packaged in a container
* Infrastructure as Code files in `pipelines/*tf` for deploying this ingestion container (as serverless functions (AWS lambda) or container tasks ([Amazon ECS](https://aws.amazon.com/ecs/)))
* A folder for the management of the landing tables (`<source>-schema/`)

{% hint style="info" %}
More info about landing table schema evolution in [FAQ](/template-aws-iceberg/help/faq.md#iceberg-landing-table-schema-evolution)
{% endhint %}

The template comes with an example data ingestion pipeline deployed as a serverless function (lambda) using [dlt](https://dlthub.com/docs/intro); more details here:

{% content-ref url="/pages/QmSXv61cHRK1XAaEgwKO" %}
[Ingestion: dlt + lambda](/template-aws-iceberg/project-structure/pipelines/chess-ingestion.md)
{% endcontent-ref %}

### 2. Data Transformation Layer

The transformation layer is a [dbt](https://docs.getdbt.com/) project that transforms the data into Iceberg staging tables using the SQL query engine [Amazon Athena](https://aws.amazon.com/athena/).

This project is located in the `pipelines/transform` folder:

```
📁 pipelines/
├── 📁 transform/                   # SQL transformation project
│   ├── 📁 models/
│   │   ├── 📁 staging/            # Raw table connections
│   │   └── 📁 marts/              # Transformations
│   │
│   ├── dbt_project.yml
│   └── Dockerfile                  # For container deployment
│
└── ecs_task_dbt.tf                 # Infrastructure definition for running dbt container
```

This transformation project runs on container infrastructure ([Amazon ECS](https://aws.amazon.com/ecs/) Fargate).

More details on how this transformation project is structured here:

{% content-ref url="/pages/vC7jupqVP6pcjq6OCJ4f" %}
[Transformation: dbt](/template-aws-iceberg/project-structure/pipelines/transform.md)
{% endcontent-ref %}

### 3. Workflow Orchestration Layer

The orchestration layer coordinates the execution of the ingestion and transformation layers using workflow automation.

This template proposes an example orchestration using [AWS Step Functions](https://aws.amazon.com/step-functions/):

```
📁 pipelines/
├── 📁 orchestrate/
│   └── <source>_step_function.json  # Workflow definition
│
└── <source>_step_function.tf        # Creates an orchestration workflow in AWS Step Functions
```

<div align="center"><img src="/files/gWaz2PlyVokT92tbGOuQ" alt="Chess Pipeline Workflow" width="375"></div>

## Deployment

This template is ready to be deployed.

The stack deployment is structured in 3 steps:

* First, the infrastructure modules (base/ and pipelines/) are deployed using [Terragrunt](https://terragrunt.gruntwork.io/) for infrastructure management
* Then, the containers for the ingestion and transformation layers are built and pushed to the container registry
* Finally, the schema evolution scripts of the Iceberg landing tables are run

<figure><img src="/files/W8wRIXYoBiBa1pLwWWpi" alt=""><figcaption></figcaption></figure>

If you want to get started quickly and deploy the template from your machine, follow this guide:

{% content-ref url="/pages/Sp2fdZp3rxZePnTg8vMP" %}
[Get Started](/template-aws-iceberg/introduction/get-started.md)
{% endcontent-ref %}

To get started deploying from [GitHub Actions](https://github.com/features/actions) CI/CD, head there:

{% content-ref url="/pages/MMbEuLtK7JyjMXkDAboW" %}
[CI Deployment](/template-aws-iceberg/guides/production-deployment.md)
{% endcontent-ref %}

## Makefile

The template is composed of many Makefiles providing utilities.

Here are some examples:

* `make deploy` in the root folder will deploy the template from your machine
* `make build` in a folder with a Dockerfile will build the container
* `make local-run` in a serverless function folder will test the function locally
* etc

Everywhere you see a Makefile, run `make` and the list of possible actions will be listed

{% content-ref url="/pages/Sp2fdZp3rxZePnTg8vMP" %}
[Get Started](/template-aws-iceberg/introduction/get-started.md)
{% endcontent-ref %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.boringdata.io/template-aws-iceberg/introduction/key-concepts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
