Ensure data quality with asset checks

Data quality is critical in data pipelines. Inspecting individual assets ensures that data quality issues are caught before they affect the entire pipeline.

In Dagster, you define asset checks like you define assets. Asset checks run when an asset is materialized. In this step, you will:

Define an asset check
Execute that asset check in the UI

1. Define an asset check

Asset check can go in the assets.py file next to the asset we just defined. An asset check can be any logic we want. In our case we query the joined_data table created by our asset and ensure that customer_id is not null:

src/etl_tutorial/defs/assets.py
@dg.asset_check(
    asset=joined_data,
    description="Check if there are any null customer_ids in the joined data",
)
def missing_dimension_check(duckdb: DuckDBResource) -> dg.AssetCheckResult:
    table_name = "jaffle_platform.main.joined_data"

    with duckdb.get_connection() as conn:
        query_result = conn.execute(
            f"""
            select count(*)
            from {table_name}
            where customer_id is null
            """
        ).fetchone()

        count = query_result[0] if query_result else 0
        return dg.AssetCheckResult(
            passed=count == 0, metadata={"customer_id is null": count}
        )

info

The asset check is using the same DuckDBResource resource we defined for the asset. Resources can be shared across all objects in Dagster.

With the Dagster UI, you can now see that an asset check is associated with the joined_data asset.

TODO: Screenshot

Asset checks will run when an asset is materialized, but asset checks can also be executed manually in the UI:

Reload your Definitions.
Navigate to the Asset Details page for the joined_data asset.
Select the "Checks" tab.
Click the Execute button for missing_dimension_check.

TODO: Screenshot

Summary

The structure of the etl_tutorial module has remained the same:

src
└── etl_tutorial
    ├── __init__.py
    └── defs
        ├── __init__.py
        ├── ingest_files
        │   ├── defs.yaml
        │   └── replication.yaml
        ├── jdbt
        │   └── defs.yaml
        ├── assets.py
        └── resources.py

But there are now data checks on the assets we have created to help ensure the quality of the data in our pipeline.

Next steps

Continue this tutorial with creating and materializing partitioned assets

1. Define an asset check​

Summary​

Next steps​

1. Define an asset check

Summary

Next steps