Migrate from Apache Airflow to Dagster

Complete Step-by-Step Guide (2026)

Migrating from Apache Airflow to Dagster represents a strategic shift toward a modern, asset-oriented data orchestration platform. While both are open-source orchestrators, Dagster offers superior data lineage tracking, asset management, and testing capabilities. This guide covers the practical process of rewriting DAGs as Dagster jobs, restructuring dependencies, and validating data quality during the transition.

Why Migrate to Dagster?

Teams migrate from Airflow to Dagster primarily for improved data asset management and observability. Dagster's asset-driven model (as opposed to Airflow's task-centric DAGs) makes data lineage explicit and enables automatic dependency inference. Dagster also excels at testing (unit tests, integration tests for assets) and provides richer data type definitions. Additionally, Dagster's Sensor framework and dynamic orchestration are more flexible than Airflow's for complex pipelines. The tradeoff: Dagster's smaller ecosystem (fewer integrations) and steeper learning curve for teams deeply invested in Airflow's patterns. Migration is worthwhile if you've outgrown Airflow's lineage visibility, struggle with DAG fragmentation, or need stronger data quality enforcement.

Step-by-Step Migration Process

1. Assess Your Airflow Codebase

8-16 hours

Audit every DAG in your production Airflow instance. Document task count, dependencies, frequency, and integration points. Identify which operators you rely on (BashOperator, PythonOperator, SQL operators, etc.). Note any dynamic task generation, branching logic, or conditional execution. This inventory determines migration scope and complexity.

⚠️ Watch Out For:

  • Hidden DAGs in subdirectories may not be immediately obvious in the DAGs folder
  • Complex branching and XCom dependencies are harder to trace than simple linear DAGs
  • Airflow plugins and custom operators may not have Dagster equivalents

2. Design Asset Hierarchy in Dagster

4-8 hours

Rather than task-by-task mapping, redesign your pipelines around data assets. Identify source systems, transformations, and outputs as distinct assets. Define asset dependencies (asset A produces input for asset B). Sketch the asset lineage graph. This conceptual shift from tasks to assets is the core of the migration.

⚠️ Watch Out For:

  • Airflow's task-centric mindset doesn't map 1:1 to Dagster's asset-oriented model—rethink the architecture
  • Circular dependencies in Airflow are errors; in Dagster, they're also errors but caught earlier

3. Set Up Dagster Development Environment

1-2 hours

Install Dagster and create a project structure (poetry init, pyproject.toml with dependencies). Set up a local Dagster instance with SQLite backend for testing. Configure Dagster's code location to point to your assets. Ensure the Dagster UI runs locally without errors.

⚠️ Watch Out For:

  • Dagster's dependency management differs from Airflow—manage versions carefully
  • Code location reloading in Dagster can be slow compared to Airflow's dynamic DAG parsing

4. Rewrite First Migration DAG as Dagster Job

4-8 hours

Select a simple, non-critical DAG to be your pilot. Rewrite it as Dagster assets and ops. Define inputs/outputs explicitly. Use Dagster's @op and @asset decorators. Implement the same business logic but structured around asset production. Test locally with the Dagster UI.

⚠️ Watch Out For:

  • Airflow's XCom pulls feel magical; Dagster requires explicit input/output definitions—more verbose but clearer
  • Error handling differs: Airflow uses try/except in tasks; Dagster uses Failure/DynamicOutput types

5. Implement Scheduling and Sensors

2-4 hours

For scheduled assets, define job definitions with Dagster's job decorator and schedule assets with @daily_schedule or @sensor decorators. If your Airflow DAG uses sensors (e.g., S3KeySensor), rewrite them as Dagster sensors. Test the schedule in local development using Dagster's schedule run UI.

⚠️ Watch Out For:

  • Dagster's @sensor requires more explicit code than Airflow's sensor operators
  • Timezone handling differs—ensure schedules align with your expected run times

6. Set Up Data Quality Tests

4-6 hours

Define Dagster asset checks (tests that verify properties of your assets). Write Python functions that validate data (row count checks, NULL counts, freshness checks). Integrate these as part of asset definitions. This replaces ad-hoc Great Expectations tests in Airflow with first-class Dagster checks.

⚠️ Watch Out For:

  • Airflow's Great Expectations integration was loose; Dagster's asset checks are tightly integrated but require learning the API
  • Deciding which checks to automate vs. which to alert on takes iteration

7. Migrate Remaining DAGs Incrementally

2-4 hours per DAG (varies by complexity)

Rewrite remaining DAGs one by one, starting with simple ones. Test each migration job locally with Dagster's UI. Validate outputs match expectations. Document any Airflow patterns that don't translate (e.g., multiple-outputs branching). Accumulate confidence as you go.

⚠️ Watch Out For:

  • Don't try to migrate all DAGs at once—prioritize by criticality and complexity
  • Dynamic task generation in Airflow (e.g., DAG from list) requires different patterns in Dagster (DynamicOutput)

8. Deploy Dagster to Production

4-8 hours

Deploy Dagster to your production environment (Kubernetes, cloud VM, or managed service like Dagster Cloud). Set up the same data warehouse connections, secrets management, and resource configurations as Airflow. Configure a production code location that points to your migrated jobs. Set up monitoring and alerting.

⚠️ Watch Out For:

  • Secrets management in Dagster differs from Airflow—ensure sensitive variables are properly configured
  • Executor selection (in-process vs. Kubernetes) affects how jobs run—test thoroughly before production

9. Run Parallel Validation

6-12 hours (over 1-2 weeks)

Keep Airflow running in parallel with Dagster for 1-2 full run cycles. Compare outputs from both orchestrators for the same source data. Verify that Dagster's runs complete at expected times and produce identical results. Use Dagster's event monitoring to catch failures early.

⚠️ Watch Out For:

  • Timing differences between Airflow and Dagster can cause test data misalignment—ensure data freshness matches
  • Partial failures in Dagster expose data lineage gaps not visible in Airflow—fix these before full cutover

10. Cutover and Decommission Airflow

1-2 hours

Once Dagster passes full validation, remove Airflow from the critical path. Keep Airflow running read-only for 2 more weeks for reference. Notify the team of the change and update documentation. Remove Airflow's scheduler from your infrastructure. Archive Airflow's DAGs for historical reference.

⚠️ Watch Out For:

  • Team members may still reference Airflow documentation—update wikis and runbooks early
  • Partial job runs in Airflow may still be in progress during cutover—verify all are complete before stopping scheduler

Feature Mapping: Apache Airflow → Dagster

Apache Airflow Feature Dagster Equivalent Notes
DAG (tasks + dependencies) Job (assets + ops) Fundamental model difference. DAGs are task-centric; jobs are asset-centric.
Task operators (PythonOperator, BashOperator, etc.) @op functions Ops are more explicit about inputs/outputs. Fewer built-in operators; more customization needed.
XCom (task outputs) Op outputs and asset dependencies Dagster's type system replaces XCom's implicit serialization with explicit typing.
Sensors @sensor decorators Dagster sensors are more powerful but require more code than Airflow's operator-based sensors.
Branching (BranchPythonOperator) DynamicOutput and graph composition Dagster's dynamic output is more explicit and testable than Airflow's branching.
SLA and alerting Sensors and event hooks Dagster's sensor-based alerting is more flexible than Airflow's static SLAs.
Retry logic Retry policy on ops/jobs Both support retries; Dagster's is more granular per op.
Documentation and metadata Op/asset descriptions and tags Dagster encourages metadata and descriptions as first-class citizens.

Key Gotchas to Watch

Architecture Shift

⚠️ Migrating from Airflow's task-centric model to Dagster's asset-centric model requires rethinking how you structure your pipelines. A 1:1 task-to-op mapping often misses optimization opportunities.

Mitigation: Spend time in the design phase (Step 2) to rethink asset boundaries. Don't just translate DAGs—redesign for asset lineage clarity. This upfront effort saves rework later.

Ecosystem Size

⚠️ Dagster has a smaller ecosystem of integrations compared to Airflow (which has 400+ providers). Some specialized operators (e.g., cloud-native tools) may require custom implementation.

Mitigation: Check Dagster's integrations library early. For unsupported tools, plan to write custom ops or use HTTP/API calls via @op functions.

Learning Curve

⚠️ Dagster's asset-oriented model, type system, and testing philosophy differ significantly from Airflow. Team ramp-up time is longer than a typical Airflow upgrade.

Mitigation: Dedicate 1-2 weeks for team training. Use Dagster's documentation and tutorials. Plan pair programming sessions for the first few migrations.

Secrets and Configuration

⚠️ Airflow's Variable and Connection system differs from Dagster's Secrets and Resources. Migration requires careful handling of sensitive data.

Mitigation: Map Airflow Variables to Dagster Secrets. Use Dagster's Resource pattern for connections (databases, APIs). Test secrets access thoroughly in staging.

Performance and Scalability

⚠️ Dagster's in-process executor is simpler than Airflow's multi-executor setup. For large-scale workloads, Kubernetes or Dagster Cloud resources must be tuned differently.

Mitigation: Test executor choice in staging. Conduct load testing with realistic job counts and parallelism. Monitor performance metrics before and after migration.

Migration Timeline

⚠️ Migrating a large Airflow instance (100+ DAGs) typically takes 2-3 months, not weeks. Underestimating this slips production cutover.

Mitigation: Plan a phased approach: migrate high-volume, low-complexity DAGs first to build team confidence. Allocate dedicated engineering time; don't squeeze migration into standard sprint work.

Last updated: Jun 17, 2026