Change Data Capture

Change Data Capture (CDC) is a technique that identifies and captures changes made to data in source systems, enabling only the modified rows to be replicated rather than the entire dataset.

Definition

Change Data Capture (CDC) monitors databases and logs every insert, update, and delete operation, then streams those changes to downstream systems. Instead of re-querying an entire database on each sync (a wasteful full refresh), CDC captures only the rows that changed and sends them incrementally. This approach dramatically reduces network bandwidth, storage, and processing time—especially important for large tables. CDC can be implemented via database transaction logs, query-based polling, or dedicated CDC tools, and it's foundational to real-time data pipelines, replication, and streaming analytics.

How It Works

1. Monitor: CDC tools watch the database transaction log (binlog, WAL) or query for changed rows. 2. Capture: New, updated, or deleted rows are identified with timestamps. 3. Stream: Changes are pushed to a message queue (Kafka) or directly to the destination. 4. Apply: The destination system (warehouse, cache, analytics platform) applies the changes. 5. Resume: On reconnect, CDC remembers the last position and resumes from there.

When to Use It

Use CDC for real-time replication of large tables, for high-frequency updates you want to capture incrementally, or for streaming pipelines where full refreshes are prohibitively expensive. CDC is less essential for small, infrequently-changing tables or for batch jobs. CDC is critical for operational analytics, real-time BI, and high-performance streaming architectures.

Last updated: Jun 17, 2026