Snowflake Schema

A snowflake schema is a normalized variation of the star schema where dimension tables are further normalized into sub-dimensions, reducing redundancy at the cost of slightly more complex queries.

Definition

The snowflake schema extends the star schema by normalizing dimension tables. Instead of a single customer dimension with denormalized country and region attributes, a snowflake splits it: a customer dimension with a foreign key to a geography dimension, which in turn has region and country attributes. This approach reduces data redundancy (you store each region name once, not once per customer) but requires more joins to reconstruct full dimensions. The result looks like a snowflake in diagrams: the fact table at the center, dimensions branching out, and sub-dimensions branching from them. Snowflakes are less common than stars in modern data warehousing because cloud warehouse storage is cheap and queries are fast enough to handle denormalized stars.

How It Works

1. Start with a star schema. 2. Identify repeated attributes in dimensions (e.g., many customers have the same region). 3. Extract into sub-dimensions (create a separate region table). 4. Add foreign keys: customer.region_id → region.id. 5. Queries now require more joins but use less storage.

When to Use It

Snowflake schemas are less common today than stars. Use snowflake if storage is a severe constraint or if you're modeling a highly normalized source system. Modern practice favors denormalized stars in cloud warehouses, as storage is inexpensive and denormalized data is easier to query. Snowflakes are more common in older, on-premises data warehouse implementations.

Last updated: Jun 17, 2026