Data Quality
Data quality refers to the accuracy, completeness, consistency, and timeliness of data, ensuring that data is fit for its intended use in business decisions and operations.
Definition
Data quality is the fitness of data for its purpose. High-quality data is accurate (no wrong values), complete (no unexplained nulls), consistent (same values represented the same way), timely (fresh enough to be relevant), and conforms to business rules (prices are positive, dates are valid). Poor data quality cascades through pipelines: incorrect data in produces incorrect reports and poor decisions. Data quality issues are common: typos from manual entry, schema mismatches in integrations, null values from incomplete systems, duplication from multiple source systems. Managing data quality requires monitoring (automated checks), governance (rules and standards), and resilience (pipelines that catch and quarantine bad data rather than silently propagating it).
How It Works
1. Define: Establish quality rules (prices > 0, no null names). 2. Monitor: Run automated checks on incoming and transformed data. 3. Detect: Flag records or batches that violate rules. 4. Investigate: Root cause analysis when violations occur. 5. Remediate: Fix source issues or quarantine bad data; prevent recurrence.
When to Use It
Invest in data quality from day one. Prevention is cheaper than fixing problems downstream. Use dbt tests, Great Expectations, or similar tools to encode quality rules. Monitor quality trends; degrade gracefully (drop bad rows or flag for review rather than failing the entire pipeline).
Relevant Tools
Last updated: Jun 17, 2026