Schema Drift

Schema drift occurs when the structure of data changes unexpectedly—new columns appear, types change, or fields are removed—breaking downstream pipelines that expect a fixed schema.

Definition

Schema drift is the nightmare scenario where your data source adds a new column, renames a field, or changes a type without warning. A pipeline designed to expect exactly 50 columns suddenly receives 51. Your transformation logic assumes a date field, but the source now sends timestamps. Schema drift breaks downstream processes silently or catastrophically. It's endemic in SaaS integrations (third-party vendors push updates) and operational databases (developers add columns). Managing schema drift requires detection (monitoring schema changes), communication (alerting your team), and resilience (writing transformations that tolerate new fields).

How It Works

1. Source changes: A new field is added to the source table. 2. Ingestion: CDC or ETL ingests the new field (or fails, depending on the tool). 3. Detection: Your schema monitor detects the change and alerts. 4. Reaction: Teams either adapt their pipeline or reach out to the source owner. 5. Prevention: Governance processes discourage unannounced changes.

When to Use It

Always monitor for schema changes in production pipelines. Use schema validation tools to catch drift early. Design transformations to be resilient to new fields (SELECT * is risky; prefer explicit column lists). When you own the source, communicate schema changes to downstream teams in advance. When you don't (SaaS APIs), build in schema detection and alerting.

Last updated: Jun 17, 2026