Thursday, March 05, 2009

Considerations for using Snowflake schema : Star schema vs Snowflake schema

I'm reading: Considerations for using Snowflake schema : Star schema vs Snowflake schemaTweet this !
In Dimensional modeling parlance, star schema and snowflake schema are the fundamental blocks of modeling a relational data warehouse. Generally the modeling starts with a draft of star schema and in almost a majority of cases, it ends up in a snowflake model. As per my views, any medium to large sized datawarehouse (where there are approximately 10+ fact tables and 20+ dimension tables) cannot suffice its implementation in a star schema model.

Star schema : The simplest relational schema for querying a data warehouse databases is the star schema. The star schema has a center, represented by a fact table, and the points of the star, represented by the dimension tables. From a technical perspective, the advantage of a star schema is that joins between the dimensions and the fact tables are simple, performance, ability to slicing and easy understanding of data

Snowflake schema : A variation of a star schema is called a snowflake. In a snowflake, the dimension tables are normalized. From a reporting perspective, there is no logical difference because all the tables of a snowflake dimension can be logically joined into a single star dimension. From a performance perspective, the snowflake may result in slower queries because of the additional joins required.

There remains a trade-off between performance and business requirements and/or technical limitations while making a decision for star or snowflake schema. A majority of the IT parade that works on BI projects are mere code-workers and do not have an understanding of why the schema is modeled in a particular way, when theories like normalization already exists and have been in practice in OLTP systems.

Two basic reasons why a snowflake schema is preferred over star schema though there would be a slight trade-off in performance are as below:

1) Size and Storage : Consider a snowflake schema to reduce storage requirements for a large dimension tables. For example, Storing the information about the manufacturer in a product dimension table would make a product dimension unnecessarily large because one manufacturer may provide multiple products. Larger the dimension, the more time it will take to process while processing a cube.

2) Integrity : Consider snowflaking a dimension if one entity is referenced by multiple dimensions. For example, Customers, Resellers, and Manufacturers may all have a Geography component. Extracting geography information into a normalized table allows for consistency between geographical definitions for the different dimensions. A snowflaked dimension accessed from multiple dimensions is called a reference dimension.

This article is of a very fundamental level, but should help those fresh and curious brains who keep on wondering about why the relational data warehouse was modeled against the normalization theory that they have seen getting practiced in regular OLTP systems.

2 comments:

Anonymous said...

Namaste,

Can you give me an example where you have used a view to go from a snowflake to a star schema?

Thanks
Kirtan

Siddharth Mehta said...

Check out the SSAS Tutorial on mssqltips.com, which has been authored by me. You should get answer for your queries on SSAS from there.

Related Posts with Thumbnails