Airflow vs Dagster: A Practical Comparison After Running Both in Production
We have operated Airflow for three years and migrated two clients to Dagster. Here is when we pick each, with real config examples from production deployments.
We have operated Airflow for three years and migrated two clients to Dagster. Here is when we pick each, with real config examples from production deployments.
We get asked this question on almost every engagement: "Should we use Airflow or Dagster?"
The honest answer is that both work. We have run Airflow in production for three years across eight clients. We have migrated two clients to Dagster and started two greenfield projects on it. Neither tool is broken. The question is which one fits your team, your existing stack, and how you think about data pipelines.
This is not a feature matrix copied from docs. This is what we actually experienced running both.
Airflow wins when the team already knows it and the pipelines are mostly task-based workflows — "run this script, then that script, then load this file."
A mid-size company with 30-50 DAGs, a team of 3-5 data engineers, and pipelines that call external APIs, run Python scripts, and trigger dbt. They have been running Airflow for a year. It works. They want us to clean it up, add monitoring, and make it production-grade.
Here is what a well-structured Airflow DAG looks like in our projects:
# dags/finance__stripe__daily_sync.py
from airflow.decorators import dag, task
from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator
from pendulum import datetime
@dag(
dag_id="finance__stripe__daily_sync",
schedule="0 6 * * *",
start_date=datetime(2026, 1, 1),
catchup=False,
tags=["finance", "stripe", "tier-1"],
default_args={
"owner": "data-eng",
"retries": 2,
"retry_delay": 300,
},
)
def finance_stripe_daily():
@task()
def extract_charges():
"""Pull yesterday's charges from the Stripe API."""
from stripe_client import get_charges
charges = get_charges(days_back=1)
return charges # XCom serialization
@task()
def load_to_raw(charges: list):
"""Load raw charge records into raw_stripe.charges."""
from loaders import bulk_insert
bulk_insert("raw_stripe.charges", charges)
trigger_dbt = SQLExecuteQueryOperator(
task_id="trigger_dbt_run",
conn_id="snowflake_default",
sql="CALL staging.run_dbt_tag('stripe')",
)
charges = extract_charges()
load_to_raw(charges) >> trigger_dbt
finance_stripe_daily()
The naming convention {domain}__{source}__{pipeline_name} matters more than you think. When you have 80 DAGs and something fails at 3 AM, finding the right one fast is the difference between a 5-minute fix and a 40-minute scramble.
Dagster wins on greenfield projects and teams that think in terms of data assets rather than tasks. The mental model shift is real: instead of "run this pipeline at 6 AM," you define "I want this table to be fresh every day" and Dagster figures out what to run.
A company building a new data platform from scratch, or a team whose Airflow setup has become unmanageable (200+ DAGs, no testing, broken backfills). They want something that makes local development, testing, and iteration faster.
Here is the same pipeline in Dagster:
# assets/finance/stripe.py
from dagster import asset, AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets
@asset(
group_name="finance",
kinds={"python", "snowflake"},
description="Raw charge records from Stripe API",
)
def raw_stripe_charges(context: AssetExecutionContext):
"""Pull charges from Stripe and load to raw_stripe.charges."""
from stripe_client import get_charges
from loaders import bulk_insert
charges = get_charges(days_back=1)
bulk_insert("raw_stripe.charges", charges)
context.log.info(f"Loaded {len(charges)} charges")
@dbt_assets(manifest=dbt_manifest_path)
def dbt_models(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
The difference is subtle in a small example, but at scale it changes how the team works:
Honestly, this is where Dagster pulls ahead the most. With the dagster-dbt integration, every dbt model becomes a Dagster asset. You get:
In Airflow, dbt is a black box. You trigger dbt run and hope for the best. You can use Cosmos to break it into individual tasks, but it is a workaround, not a native integration.
Two clients asked us to migrate from Airflow to Dagster. Both times, we did it incrementally:
The migration itself is not technically difficult. The hard part is retraining the team's mental model from tasks to assets. Budget time for that.
Pick Airflow when:
Pick Dagster when:
We have no allegiance to either tool. We pick whatever makes the client's team more productive. But if you asked us what we would choose for a new engagement with a modern dbt-centric stack — it would be Dagster. The developer experience gap is real.