I have been attending a number of meeting these days, and I am encountering a common question everywhere - What are the layers that you plan to incorporate in the architecture design of a data warehouse development life-cycle? I found this question rather interesting and thought of sharing my views on the same.
Speaking from a high level, from a DW development perspective, firstly one needs to figure out the boundaries of development. Generally the extreme boundary starts from OLTP and ends at Dashboards. After having these boundaries considered, the following are the layers / development arenas one can consider for a DW development from scratch.
1) Delta Detection - This is the first exercise that you would plan with your OLTP system. This is a very important exercise, as this would decide a few other exercises in the life-cycle.
2) Staging Area - Based on the requirements and considering the complexity of delta detection, a temporary / permanent staging area development would be required.
3) Master Data Management - Do not confuse it with the standard MDM practice, which is more towards modeling. Here MDM means how you would manage your MDM in the staging area / in the delta, as delta applies only to transactional data. Master data do not change that often, and you need master data for your ETL processing. This has to be managed at the facade layer you would build for delta detection / in the staging area, but it has to be planned along with the points 1 and 2.
4) Dimensional Modeling - This is the exercise where you start modeling your dimensions using the Kimball / Inmon methodology.
5) Data Mart Design & Development - Only after the point 4, one can start developing a data mart which lays down the base for the next exercise of ETL development.
6) ETL Design & Development - Points 1 - 2 - 3 are to serve the ETL processing. ETL basically serves as the processing engine to transform your relational data to suit the model of your data mart. One can consider the above points in E and this is T + L. Until you Data Mart is in place, one does not have any idea about the target schema, so ETL development makes sense only at this level.
Speaking from a high level, from a DW development perspective, firstly one needs to figure out the boundaries of development. Generally the extreme boundary starts from OLTP and ends at Dashboards. After having these boundaries considered, the following are the layers / development arenas one can consider for a DW development from scratch.
1) Delta Detection - This is the first exercise that you would plan with your OLTP system. This is a very important exercise, as this would decide a few other exercises in the life-cycle.
2) Staging Area - Based on the requirements and considering the complexity of delta detection, a temporary / permanent staging area development would be required.
3) Master Data Management - Do not confuse it with the standard MDM practice, which is more towards modeling. Here MDM means how you would manage your MDM in the staging area / in the delta, as delta applies only to transactional data. Master data do not change that often, and you need master data for your ETL processing. This has to be managed at the facade layer you would build for delta detection / in the staging area, but it has to be planned along with the points 1 and 2.
4) Dimensional Modeling - This is the exercise where you start modeling your dimensions using the Kimball / Inmon methodology.
5) Data Mart Design & Development - Only after the point 4, one can start developing a data mart which lays down the base for the next exercise of ETL development.
6) ETL Design & Development - Points 1 - 2 - 3 are to serve the ETL processing. ETL basically serves as the processing engine to transform your relational data to suit the model of your data mart. One can consider the above points in E and this is T + L. Until you Data Mart is in place, one does not have any idea about the target schema, so ETL development makes sense only at this level.
7) Cube Design & Development - This exercise can be done in parallel with point 6. Once you have your data mart, you can start with this exercise. In fact, you can start your exercise even before your data mart is in place, but if your data mart is not ready means your dimensional modeling is still too volatile / evolving. So better start this exercise after you have some concrete model ready for your data mart.
8) Operational and Analytical Report Design - Cube is generally refreshed at regular intervals and only exception are real-time cubes. For those refresh windows where cube does not have the data from OLTP that is loaded after the latest cube refresh cycle, operational reporting needs to be provided. And analytical reports would serve as the constituent for scorecards / webparts that would be used in Dashboard development.
9) Dashboard Development - After the above phases are ready, you have the engine ready and it's time to give a face to your machine. Generally dashboards are fuelled from cubes to a major extent, and this is generally the final phase of a DW development life-cycle.
I have tried to describe the various development cycles that form a DW development life-cycle from a very high level. Feel free to add to it.
3 comments:
Siddharth, this approach is a dimensional DWH one, which has several issues in the long term. Take a look at SQLBI Methodology (http://www.sqlbi.com/sqlbimethodology.aspx), which makes a distinction between DWH (variation from Inmon approach, enabling incremental structural changes) and DM (traditiona dimensional Kimball design).
I'd like to get your feedback on that.
Marco Russo
http://sqlblog.com/blogs/marco_russo
Sure Marco, this would be the topic of my next post.
Ok nice!
Post a Comment