Recently Todd posted about Cross Join in SSIS, and he explained in great detail about how to do a cross join in SSIS, which can be read from here. It's important to understand the how part, but the "Why" part is also equally important in my views. If someone is able to answer how to do a cross join in SSIS, I would also like to know some real life scenarios where you would want to perform a cross-join using SSIS in a Data Flow. There are two real life scenarios where performing cross joins in SSIS would be beneficial, and of course, these are such scenarios where either you are not concerned about performance at all, or your intention is to test performance.
1) Populating Junk Dimensions: Junk dimensions are often formed with unrelated columns (i.e. attributes) in a single dimension table. Generally such tables contains very few records and can be easily populated using a script. But as a part of the policy, sometimes you may face situations where you need to populate Junk dimensions or even dimensions like "Date" using SSIS. In such situations you may find the need to use SSIS packages to populate such dimensions. Often such dimension populating is a one time process, and you would not use these packages often.
2) Performance testing SSIS packages: I personally feel that cross joining is a great way to generate huge test data with very less effort. Often when you need to test packages for performance or even other factors, the first thing that is required is test data. Say if I have 100 records in one data stream and 100 records in another data stream, by cross joining I can easily create 100 * 100 = 10000 records. This is some pretty quick test data !!
There can be many other scenarios, but these are the scenarios where I would use SSIS to create a cross join / Cartesian product, instead of using the database engine for the same purpose.
Cross Join is one such thing that is required in your MDX queries too. When you have two different hierarchies, for ex. say Products and Colors and you want to use it on the same axis i.e. you want a pivot kind of view where you want to see all products of all colors on the row axis / columns axis itself, you cannot achieve the same in MDX without cross joining these two different hierarchies on the same axis. This is one typical use where you would need to use Cross Join in SSAS / MDX.
In MDX, you can cross join different hierarchies by three different syntax or ways.
1) CrossJoin function: CrossJoin (Hierarchy1.level.Members , Hierarchy2.level.Members)
2) * operator: This is one of the shorthand for cross joining. Hierarchy1.level.Members * Hierarchy2.level.Members
3) () round brackets: To specify a set you would use curly braces {}. But if you intend to cross join, you can simply use the round brackets. (Hierarchy1.level.Members , Hierarchy2.level.Members).
Keep in view the I have included hierarchy.level.members for clarity. You can also just use hierarchy.level and that would bring all the members accordingly. In summary, in my views when you prepare for your interviews for questions like this, keep in mind that "why would you do something" is equally important to knowing "how would you do something".
No comments:
Post a Comment