Thursday, January 06, 2011

SQLBI Methodology - Review

I'm reading: SQLBI Methodology - ReviewTweet this !
Marco Russo recently provided me an opportunity to provide my feedback on SQLBI Methodology, which is an architecture designed by Marco Russo and Alberto Ferrai to the best of my knowledge. This well documented architecture can be read from here.

Below are my views / feedback after analysing the architecture document. To better understand my views, please read the architecture document prior to reading the below points.

Size of BI solution Vs Complexity: In my views/experiecne, the volume of data that needs to be processed ( right from OLTP till it gets stored in MOLAP ) combined with the size of the BI solution sponsor is directly proportional to the adoption of BI solution develppment / adoption methodology.
In simple words, if SMBs can manage their BI solution development using a SaaS methodology compared to developing DW after buying software licenses, most would approach the former methodology. Those businesses who adopt the full refresh of DW every time, would not care much for a methodical approach as change management and incremental loads are not their concerns at all due to the short-lived historical state of DW. Organizations having a large DWs, often in units of TBs would definitely care for a methodical approach as the magnitude of impact of any change is quite huge.

Components of a BI Solution:

1) Source OLTP database - OLTP has known issues, which majorly affect delta detection and accessiblity of DB to DW development team. I completely agree on this part. Here a concept called "Mirror OLTP" is introduced.

If Mirror OLTP is used as a facade, it would not be of much sense as if you can create database objects in other database and then fire a cross-db query, then logically speaking the same facade should be allowed to be created in source OLTP with the isolation of a logical boundary like schemas.

If Mirror OLTP is considered as a snapshot, which is almost a clone of the original DB, one can exercise full control over the source DB, but this is not as easy to implement as it sounds. Consider that a source DB that lies on a SAN and is horizontally partitioned across geographies and you are trying to replicate the same. For this Mirror OLTP you would require to constanly maintain another DB, which demands an increases TCO of the solution. It would require a pass from information security policies and guidelines as audits like BS7799 / SOX etc would require strict compliance.

Instead of developing a Mirror OLTP, one option can be, creating a script of views / SPs / any database objects, and create them just-in-time (which would be also easy for approval from DBAs as they would be more happy for this temporary gateway opening than a permanent cross-db gateway), use them for delta detection and then purge out the same. These scripts can be deployed using VSTS 2010 / VSTS DB edition / any other change management tools you would have at your disposal.

In worst cases, where this option can't be worked, we can opt for what we call as Permanent Staging area, completely suide with data and metadata aligned towards facilitating ETL for DW loads. To me, Mirror OLTP seems to be a compound of the same.

2) Configuration DB - This seems like a facade opened up for users to configure the Mirror OLTP / Staging area / DW / Data Marts, with some built-in configuration settings / logic for each layer.

3) Staging area - This is identified as a temporary staging area. Here it's mentioned as one cannot store persistent data in staging, and I opt to differ from this theory. For managing master data from different source systems, which would not contain delta everytime but would still be required for ETL processing due to ER model design, a permanent staging area can exist. Temporary staging area is also required, and this section is completely alright with me.

4) Data warehouse and Data Marts - This details mentioned in this section seems almost Inmon methodology, where you develop a DB containing your consolidated data from ETL and then you build marts which can be thought of as a limited compound of DW DB. This can be thought as synonymous to what perspectives are to cubes. Data Marts are basically crafted here to compartmentalize different functional areas in DW. Here you would be required to create what I term as "Master Data Mart" and other data marts would be based on functional areas. Maintaining and populating these data marts can be quite challenging.

5) OLAP Cubes - You would create cubes containing data from one functional data mart + master data mart.

6) SSRS Reports - To me this deparment seems to be struggling, due to the design of data marts. Reporting requirements can be extremely volatile which can be complex enough to induce a change, which would require manipulating ETL -> DW -> Data Mart. Also there can be cases where you might introduce an another small ETL layer between DW -> Data Mart. Operational reporting would be done against Data Marts and not DW, as this architecture is an adoption of Inmon's view to a greater extent.

7) Client Tools - This section is okay with me.

8) Operational Data Store - This section clearly identifies that ODS should be used with care.

Summary: In my views, this architecture can be perceived to act like a Prism. You have a ray of light, and after passing through the prism, it splits out in different colors. And you can catch the color you need.

One big issue that I see with this architecture is Lineage Analysis. In this architecture lineage analysis becomes very very complex, as deriving lineage of data from a dashboard till OLTP is highly challenging. In addition to the configuration DB layer, there should be one more vertical layer where lineage of the data is tracked.

Considering a practical example, say you have a corporation that consists subsidiary companies, for ex CitiGroup has child companies like CitiCorp, CitiBank, CitiSecurities, CitiMortgage etc. When data is intended for CitiGroup level to CitiBank level, this architecture can hold good, as each child companies is a different business unit & model with it's own level of complexity. And each child company's analysis would have a dependency on data from other child companies to a certain level. This architecture seems effective at this stage.

If I were to implement this only at the CitiBank level, I would clearly go for Kimball methodology. But this is my understanding, analysis and choice. There are a lot of scenarios in the sea of business models and requirements, and I am sure this architecture with certain modifications (which is my personal preference), can help in a very effective manner.

2 comments:

Jalpesh Vadgama said...

Congratulations!!! Siddarth for Your MVP Title. Very well deserverd one!!

Marco Russo said...

Thank you for reading and commenting the paper, feedback is welcome!
Just a few notes:
- The OLTP Mirror is a logical layer that we usually implement in a physical way by creating snapshot for entities (customer, product, ...) and by using views for transactions (that are not supposed to change over time)
- The DWH has an Inmon design (it uses business key) and the DM has a Kimball Design (star schema and surrogate keys). This is a very important part and it is fundamental to solve many issues of maintenance over time, when different requests needs to be implemented on the DM side

Related Posts with Thumbnails