Thursday, November 25, 2010
How SSIS Performance can be optimized using Change Data Capture and/or Change Tracking , MERGE , and Filtered IndexesI'm reading: How SSIS Performance can be optimized using Change Data Capture and/or Change Tracking , MERGE , and Filtered IndexesTweet this !
Delta detection, source data extraction and smooth sourcing of input data from OLTP and/or Relational data warehouse is a part of the overall ETL process. Though the performance optimization boundary is outside the regime of SSIS, but these are the fuelling stations of SSIS. OLTP and relational data warehouse are the two main data stations from which SSIS fetches and/or loads data, and relational data warehouse (i.e. data marts / ODS, whatever be the case) is often used to read as well as write data, and optimizing these relational sources can contribute to performance optimization of SSIS packages. In case if you are using SQL Server 2008 platform, some of those that can be effectively used for optimizing data access especially by SSIS packages are as below:
1) Change Data Capture (CDC) and/or Change Tracking: I would not go into the details of the difference between Change Data Capture and Change Tracking. But this is one of the effective means of delta detection. Actually more than performance, it provides a controlled and automated infrastructure for delta detection. Depending upon the kind of algorithm followed in the master ETL package that handles delta detection and extraction, CDC / Change Tracking can be used. A whitepaper that explain in depth methodology of integrating CDC with SSIS can be downloaded from here.
2) MERGE statement: MERGE is one of the smart means to off load processing of data for Slowly Changing Dimensions. A nice brief article from kimballgroup explaining the same concept can be read from here.
3) Filtered Indexes: Mostly everyone is aware of two types of indexes in SQL Server - Clustered and Non-Clustered. But Filtered Index is a new advancement and is a very useful weapon to optimize OLTP as well as relational data warehouse for optimizing data access without impacting the overall system or impacting only the system for which SSIS needs a helping hand from the relational sources for faster data access. Again I would not go into the details of filtered indexes, and a thorough section that explains the same can be read on MSDN from here.
Mostly we talk of how to optimize SSIS package performance, but the bigger picture is how do we optimize ETL process where SSIS is the driving ETL tool, and these pointers can provide some direction in an effort to optimize SSIS package performance, though indirectly.