Build your first ETL pipeline
In this tutorial, you'll build a full ETL pipeline with Dagster that:
- Imports data into DuckDB using Sling
- Transforms data into reports with dbt
- Runs scheduled reports automatically
- Generates one-time reports on demand
- Visualizes the data with Evidence
You will learn to:
- How to set up a Dagster project with the recommended project structure
- Integrate with other tools
- Create and materialize assets and dependencies
- Ensure data quality with asset checks
- Create and materialize partitioned assets
- Automate the pipeline
- Create and materialize assets with sensors
Prerequisites
To follow the steps in this guide, you'll need:
- Basic Python knowledge
- Python 3.9+ installed on your system. Refer to the Installation guide for information.
- Familiarity with SQL and Python.
- Understanding of data pipelines and the extract, transform, and load process.
Set up your Dagster project
-
Open your terminal and scaffold a new project with
uv
:uvx create-dagster project etl_tutorial
-
Change into that project
cd etl_tutorial
-
Activate the project virtual environment:
- MacOS
- Windows
source .venv/bin/activate
.venv\Scripts\activate
-
To make sure Dagster and its dependencies were installed correctly, start the Dagster webserver:
dg dev
In your browser, navigate to http://127.0.0.1:3000
TODO: Screenshot
At this point the project will be empty but we will continue to add to it throughout the tutorial.
Next steps
- Continue this tutorial by creating and materializing assets