Monday, November 22, 2010
I'm reading: Lineage Analysis in SSISTweet this !
In the recently released video / webcast of "What's new in SSIS Denali", SSIS team talks a little about Lineage analysis and the efforts they are making in this direction. The purpose that seems behind these efforts are to provide a transparent tracking of data flow end-to-end right from it's source where the data comes into the scope of tracking till the data makes to the end-user territory i.e. dashboards or excel. This can potentially open the scope of a new business stream in itself, but still that is far.
Lineage analysis is quite important even in the ETL development methodology. In SSIS, lineage of a data stream or I can say lineage of each field is tracked using a LineageID property. Understanding of SSIS architecture is quite important to understand where and why LineageID changes. Wherever the transformation is asynchronous, or precisely speaking whenever the input buffer and output buffers from the transformation are different due to the processing that a transformation applies on the buffer, the source of data generation can be considered as the new lap in the race.
This can been seen like a relay race where one runner hands over the baton to the next runner, and here the baton is the data. But when the race ends, one needs to know that how many runners exchanged the baton and which runner kept the baton for what amount of time. This is logically what we mean by lineage analysis. Whenever an asynchronous transformation is used, it creates a closing door for the logical flow before it, and an opening door for the logical flow after it. Such logical flows are classified as execution trees in SSIS. Support for lineage analysis is quite limited out-of-box in SSIS, and you can use Metadata Toolkit 2008 upto an extent for the same.
Taking this lineage analysis to a higher level, the same kind of analysis across the end-to-end BI lifecycle right from OLTP till Dashboards is very difficult to envision, implement, track and analyze. This would require some service contracts between MS BI services i.e. SSIS - SSAS - SSRS at least. When this data would make it to dashboarding platform which is as of now Performancepoint Services in Sharepoint 2010, it would again have to have a new kind of integration with Sharepoint which is out of SQL Server territory. Still there seems to be a way out as Sharepoint feeds on SQL Server. This is probably what we can expect to see in future.