Thursday, April 30, 2009

BI xPress - SSIS Auditing, Notification, Deployment and Management Solution

I'm reading: BI xPress - SSIS Auditing, Notification, Deployment and Management SolutionTweet this !
While surfing the new updates on SSIS, I just came across this new framework. Initially, the functionality it offers for SSIS administration impressed me a lot. This UI it offers is also nice and the development team has done a smart job by integrating the functionality in BIDS.

Now the time comes for the improvement areas or I would like to use the word "limitations". I searched the entire site of BI xPress, but I was not able to find out any technical details of how this functionality is implemented. No architectural detail, no higher level implementation details, and the entire site sings of it's power packed functionality it offers but there's no discussion of any performance related issues.

I think in very simple terms. If I want to notify someone of some event, either I would be subscribing events to the SSIS framework or on SQL Server, or the second option can be intercept the communication/events. And there is no technical detail provided for how the framework fits in the ETL picture.

This tool is on my evaluation list, but definitely by the way the tool is being promoted and brought before the BI community, I feel that it has long way to go before senior technical audiance can feel positive vibes for using this tool.

Excerpt from the BI xPress site:

Auditing Framework:

You can audit the following information using reports provided with Auditing Framework (Note: Use Report Viewer Application to view auditing data.)

  • Which packages are currently running and which task is running inside the package
  • Historical package execution detail for selected date range (i.e. Run time, Errors, Warnings etc.)
  • Error and Warning by Task and Package
  • Run time by Task and Package
  • Extracted and Loaded Records along with their source and target information (e.g. Table/View, Sql Query, File Name, Component Name, Data Flow Name, Connection String etc.).
  • Run time Trend for several days/weeks/years by Package and Task
  • Error/Warning/ Trend by Package and Task
  • Extract/Load Trend by Package, DataFlow
  • Extract/Load Trend by Data Object (e.g. File, Table/View or Sql Query)

SSIS Package Deployment

  • Deploy SSIS Packages to SQL Server, SSIS Store or File System (Bi-Directional Transfer supported using Left/Right arrows) .
  • Deploy Config files associated with packages along with packages.
  • Change Config File location for target packages during deployment.
  • Change Protection Level for target packages during deployment.
  • Open selected Packages in Visual Studio using single click without creating Projects.
  • Use keyboard shortcuts to copy/paste packages/config files from source->target or target->source
  • Visual Studio Integration (deploy single or multiple packages for Project or Solution).

Notification Framework

  • Configure email notification for single or multiple recipients.
  • Configure Text Messaging for Cell/PDA/Blackberry for single or multiple phone (Major wireless service providers around the globe supported).
  • Configure alerts for one or more events which includes OnError, OnWarning, OnSuccess or OnCompletion.
  • Filter certain warnings/errors.
  • All settings are stored in config file so you can easily change Notification settings without modifying packages.
  • Visual Studio Integration (enable/disable notification for single or multiple packages of Project or Solution).


SQL Server Proxy Accounts Feature : Use of proxy accounts in SSIS

I'm reading: SQL Server Proxy Accounts Feature : Use of proxy accounts in SSISTweet this !
A classic problem in SSIS and DTS is that a package may work in the design environment but not work once scheduled. Typically, this is because you have connections that use Windows Authentication. At design time, the package uses your credentials, and when you schedule the package, it uses the SQL Server Agent service account by default. This account may not have access to a file share or database server that is necessary to successfully run the package. Proxy accounts in SQL Server 2008 allow you to circumvent this problem.

With a proxy account, you can assign a job to use an account other than the SQL Server Agent account. Creating a proxy account is a two - step process. First, you must create a credential that will allow a user to use an Active Directory account that is not their own, Then, you specify how that account may be used.

How to do this ?

To first create a credential, open Management Studio and right - click Credentials under the Security tree and select New Credential. For this example, you ’ ll create a credential called Admin Access. The credential will allow users to temporarily gain administrator access. For the Identity property, type the name of an administrator account or an account with higher rights. Lastly, type the password for the Windows account, and click OK.

The next step is to specify how the credential can be used. Under the SQL Server Agent tree, right - click Proxies and select New Proxy, which opens the New Proxy Account dialog box. Type Admin Access Proxy for the Proxy Name property, and Admin Access as the Credential Name. Check SQL Server Integration Services Package for the subsystem type allowed to use this proxy.

Optionally, you can go to the Principals page in the New Proxy Account dialog box to state which roles or accounts can use your proxy from SSIS. You can explicitly grant server roles, specific logins, or members of given msdb roles rights to your proxy. Click Add to grant rights to the proxy one at a time.

You can now click OK to save the proxy. Now if you create a new SSIS job step, you ’ ll be able to use the new proxy by selecting the Admin Access Proxy from the Run As drop - down box. Any connections that use Windows Authentication will then use the proxy account instead of the standard account.

Thursday, April 16, 2009

Delta Detection Techniques in ETL Architecture

I'm reading: Delta Detection Techniques in ETL ArchitectureTweet this !
Log of transactions: For this approach, you need a log of all the changes to the source tables. This approach is most commonly used for transaction records, and for changes to a master table when the table is very large.

Snapshot of source system: This approach is to copy the entire content of tables to staging and then determine the changed/added records. This is a time consuming process, particularly if the source tables are huge. This is a preferred method for small tables, such as some master tables. This approach does not capture intermediate changes.

Snapshot of changed records: This approach is to capture the changed/added records directly from master or transactional tables. You do not need a separate log of changes. But this process does require the modification date and time of the changes. This approach does not capture intermediate changes.

There are various delta detection techniques to choose from based on the capabilities of the source system and business requirements. If the source system creates change logs, then detecting extracting the delta is relatively straightforward. If the source system does not help with delta detection and you cannot make changes to the source, then the extraction process has to bring in all the transactional records to staging and determine the changed records. This is time consuming and for huge OLTP data volumes it is not operationally feasible. Hence choosing appropriate delta detection technique becomes important for efficient and effective extraction process.

Reference: BI Architecture Design Whitepaper

Tuesday, April 07, 2009

Replacement of SCD Task : Kimball Method SSIS Slowly Changing Dimension Component

I'm reading: Replacement of SCD Task : Kimball Method SSIS Slowly Changing Dimension ComponentTweet this !
The credit of this post goes to the one who made an anonymous comment on my previous post on which I wrote about SCD Task optimization. Below are the features that KimballSCD SSIS Component offers as per its home page. Thou I have not used it yet, but it's definitely on my evaluation list now.


Features:

  • One component on the design surface that can be edited without adverse effects on the rest of the Data Flow. (As opposed to the SCD Wizard that creates multiple components that are destroyed and rebuilt if the Wizard is run again.)
  • Insane performance - measured to be 100x superior by use of multiple threads, sort optimization, and implied outcome determination. (As opposed to using the SCD Wizard with single-threaded uncached row-by-row lookups.)
  • Surrogate Key management. (As opposed to zero support in the SCD Wizard.)
  • "Special" (unknown) member support, per Kimball Method best practices. (As opposed to zero support in the SCD Wizard.)
  • Includes a "row change reason" output column on all (except Unchanged) outputs, per Kimball Method best practices. (As opposed to zero support for debugging why a row was directed to a particular output in the SCD Wizard.)
  • Supports simple and advanced styles of Row Auditing for inserts and updates, per Kimball Method best practices. (As opposed to zero auditing support in the SCD Wizard.)
  • Flexible column comparisons: case (in)sensitive and space (in)sensitive as desired, plus culture-sensitivity. (As opposed to zero support in the SCD Wizard.)
  • Flexible SCD 2 "current row" handling - permits specification of the date "endpoints". (As opposed to the SCD Wizard requiring the "expiry" date be NULL only.)
  • Flexible SCD 2 date handling - permits specification of what date expired and new rows get marked with. (As opposed to the SCD Wizard leaving that up to a Derived Column component that will get destroyed when the Wizard is run again to add/fix other properties.)
  • Reads the existing dimension from the Data Flow, not a Connection Manager - allowing the package designer to cache the existing dimension table as they see fit. (As opposed to the SCD Wizard that only supports some OLE DB Connection Managers.)

Monday, April 06, 2009

SCD Task Performance Optimization

I'm reading: SCD Task Performance OptimizationTweet this !

  • Create an index on your dimension table for the business key, followed by the current row identifier. If a clustered index does not already exist, create this index as a clustered index, because this will prevent a query plan lookup from getting the underlying row. This will help the lookup that happens in the SCD as well as all of the updates.

  • The row - by - row updates can be changed to set - based updates. To do this, you will need to change the OLE DB command to a Destination adapter to stage the records to a temporary table, then in the Control Flow, add an Execute SQL Task to perform the set - based update.


  • If you remove all the OLE DB command transformations, then you can also change the Insert Destination to use fast load and essentially bulk insert the data, rather than performing row - at - a - time inserts.

Saturday, April 04, 2009

Drawbacks of using SCD Task in your SSIS package : SCD Task Usage Considerations

I'm reading: Drawbacks of using SCD Task in your SSIS package : SCD Task Usage ConsiderationsTweet this !
  • For each row in the input, a new lookup is sent to the relational engine to see if changes have happened. In other words, the dimension table is not cached in memory. That is expensive! If you have tens of thousands of dimension source records or more, this can be a limiting feature of the SCD.

  • For each row in the source that needs to be updated, a new update statement is sent to the dimension table (and updates are used by the changing output, historical output, and inferred member output). If you have a lot of updates happening every time your dimension package runs, this will cause your package to run slow.

  • The Insert Destination is not set to fast - load. This is because deadlocks can occur between the updates and the inserts. When the insert runs, each row is added one at a time, which can be very expensive.

  • The SCD works well for historical, changing, and fixed dimension attributes, and changes can be made to the downstream transformations. However, if you open up the SCD wizard again and make a change to any part of the wizard, you will automatically lose your changes.

Friday, April 03, 2009

Tools for SQL Server Bundle : Toad for SQL Server V4.1

I'm reading: Tools for SQL Server Bundle : Toad for SQL Server V4.1Tweet this !
I was going through a Review of Toad for SQL Server article on SQLMag, and the product seems to be definitely promising. I have not yet used the product, but just by getting to know the features it offers, one can say that this product deserves an evaluation.

By reading the below excerpt of the original article, one can easily say that this tool is a bundle of features that SSMS, DB Pro GDR, Internals Viewer for SQL Server, and BIDS Helper offers collectively and still some features would remain unique to this tool. The only issue with this product is the price, which seems to be a little bit high than reasonable.

"There are three versions of Toad for SQL Server. The Professional Version costs $595. Toad for SQL Server Xpert, $1,045, adds T-SQL tuning wizards and capabilities, and Toad for SQL Server Development Suite, $1,295, adds benchmarking capabilities.

Toad's object explorer fairly closely matches the functionality provided by SSMS' object explorer, but double-clicking an object such as a stored procedure or table instantly loads the script for the selected object along with an extremely impressive bevy of context-sensitive metadata. The data includes object sizes, constraints, permissions, and dependencies. This metadata display represents one of Toad's greatest strengths, as it shows exactly the kind of things that DBAs and other serious database professionals would want to see when interacting with objects.

Toad also provides other features beyond its responsive and customizable UI. For example, all versions of Toad include log reader functionality (which allows you to review SQL Server's log files and do undo and replay operations), schema and data comparison and synchronization tools, and custom reporting and automation engines or frameworks. These tools alone can almost justify the cost of the Professional version in some environments.

Of course, Toad comes with the basics that you need to manage and develop databases. There's an ER diagramming tool, a visual query builder, support for backup and restore operations, and the ability to view and manage jobs. In short, Toad offers everything you need to replace SSMS, plus an assortment of other tools and utilities."

As per the product features page, it offers the following features and benefits:

Group Execute – Toad for SQL Server provides the ability to execute SQL scripts across multiple SQL server instances streamlining administration and development requirements for implementing database changes in a whole sale manner.

Compare and Synchronize – Compare and synchronize data, schemas and servers in Toad for SQL Server’s compare and synch tool in a few simple clicks. This allows SQL Server DBA’s and developers to work within maintenance windows in the face of growing datasets, easing the pain of meeting SLAs and reducing potential for failure.

Transaction Log Reader – Toad for SQL Server allows you to recover data stored in transaction log files, rolling back operations and reconstructing transactions.

Xpert Tuning – Completely automate the process of resolving SQL performance issues. Xpert tuning will identify, tune and benchmark problematic SQL in your SQL Server environment, regardless of where your SQL resides. Xpert tuning also provides the ability to investigate indexing strategies through automated index generation.

Recall SQL Scripts – Toad for SQL Server automatically saves every SQL statement executed against the database environment. These statements can be searched and recalled allowing users to quickly locate statements for review and can also be saved for long-term use.

Database Administration – Toad for SQL Server simplifies database administration tasks for creating, altering and managing database objects, SQL code, users, logins, and security across multiple servers.

Version Control - Toad for SQL Server provides integration with many popular version control vendors including: Visual Source Safe, Subversion and CVS.

Intellisense – Automate the coding process with intellisense by completing database code activity in the ‘smart’ SQL Editor window.

SQL Performance Optimization – Optimal SQL performance is ensured through an automated process of identifying, optimizing and testing SQL statements so that the best performing SQL is deployed in the environment.

Difference viewer – Difference Viewer helps you to identify discrepancies between files, data and objects, then edit duplicate data; you can also export data to Excel with linked queries as well as add your own notes to database objects.

Query Builder – Toad for SQL Server facilitates the process of coding SQL by letting you quickly drag and drop tables to create queries, from simple to complex.

Grid Customization – Customize the data grid styles visible in the toolset by applying skins to the views – quickly modifiable in the Configuration wizard.

Reporting – Report Writer in Toad for SQL Server allows you to quickly build a report from the console, facilitating documentation of database objects with the schema reporting. It also includes support for pivot grids, and a chart designer.

Visual Data Modeling – Visualize tables, dependencies and database relationships in your SQL Server environment in a data modeling view.


Thursday, April 02, 2009

BI Interview Question : Types of Dimension

I'm reading: BI Interview Question : Types of DimensionTweet this !
Slowly Changing Dimensions: Attributes of a dimension that would undergo changes over time. It depends on the business requirement whether particular attribute history of changes should be preserved in the data warehouse. This is called a Slowly Changing Attribute and a dimension containing such an attribute is called a Slowly Changing Dimension.

Rapidly Changing Dimensions: A dimension attribute that changes frequently is a Rapidly Changing Attribute. If you don’t need to track the changes, the Rapidly Changing Attribute is no problem, but if you do need to track the changes, using a standard Slowly Changing Dimension technique can result in a huge inflation of the size of the dimension. One solution is to move the attribute to its own dimension, with a separate foreign key in the fact table. This new dimension is called a Rapidly Changing Dimension.

Junk Dimensions: A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign keys in the fact table. Junk dimensions are often created to manage the foreign keys created by Rapidly Changing Dimensions.

Inferred Dimensions: While loading fact records, a dimension record may not yet be ready. One solution is to generate an surrogate key with Null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension.

Conformed Dimensions: A Dimension that is used in multiple locations is called a conformed dimension. A conformed dimension may be used with multiple fact tables in a single database, or across multiple data marts or data warehouses.

Degenerate Dimensions: A degenerate dimension is when the dimension attribute is stored as part of fact table, and not in a separate dimension table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often used as the result of a drill through query to analyze the source of an aggregated number in a report. You can use these values to trace back to transactions in the OLTP system.

Role Playing Dimensions: A role-playing dimension is one where the same dimension key — along with its associated attributes — can be joined to more than one foreign key in the fact table. For example, a fact table may include foreign keys for both Ship Date and Delivery Date. But the same date dimension attributes apply to each foreign key, so you can join the same dimension table to both foreign keys. Here the date dimension is taking multiple roles to map ship date as well as delivery date, and hence the name of Role Playing dimension.

Shrunken Dimensions: A shrunken dimension is a subset of another dimension. For example, the Orders fact table may include a foreign key for Product, but the Target fact table may include a foreign key only for ProductCategory, which is in the Product table, but much less granular. Creating a smaller dimension table, with ProductCategory as its primary key, is one way of dealing with this situation of heterogeneous grain. If the Product dimension is snowflaked, there is probably already a separate table for ProductCategory, which can serve as the Shrunken Dimension.

Static Dimensions: Static dimensions are not extracted from the original data source, but are created within the context of the data warehouse. A static dimension can be loaded manually — for example with Status codes — or it can be generated by a procedure, such as a Date or Time dimension.

Reference: Ralph Kimball Articles

Wednesday, April 01, 2009

Tool to plan security requirements of any BI Solution

I'm reading: Tool to plan security requirements of any BI SolutionTweet this !
While going through a book, I just skimmed through a chapter that described various measures to plan for security requirements of a Business Intelligence solution. One of the recommendation was use of MSAT tool. So I thought of mentioning it on this blog. This tool is not designed to work with SSIS, SSAS and SSRS security in specific. But seems to be a useful tool in environments where compliance is a big consideration, for example - security consideration is a big concern on any project associated with a financial domain client. I am not sure how useful it would be for BI projects, but this seems to be a useful document to produce for quality and audit perspective, especially security audits like BS7799.

Following is an excerpt from the Microsoft Security Assessment Tool home page:

MSAT consists of over 200 questions covering infrastructure, applications, operations, and people. The questions, associated answers, and recommendations are derived from commonly accepted best practices, standards such as ISO 17799 and NIST-800.x, as well as recommendations and prescriptive guidance from the Microsoft Trustworthy Computing Group and other external security sources.

The assessment is designed to identify the business risk of your organization and the security measures deployed to mitigate risk. Focusing on common issues, the questions have been developed to provide a high-level security risk assessment of the technology, processes, and people that supports your business.

Beginning with a series of questions about your company's business model, the tool builds a Business Risk Profile (BRP), measuring your company’s risk of doing business due to the industry and business model defined by BRP. A second series of questions are posed to compile a listing of the security measures your company has deployed over time. Together, these security measures form layers of defense, providing greater protection against security risk and specific vulnerabilities. Each layer contributes to a combined strategy for defense-in-depth. This sum is referred to as the Defense-in-Depth Index (DiDI). The BRP and DiDI are then compared to measure risk distribution across the areas of analysis (AoAs)—infrastructure, applications, operations, and people.

In addition to measuring the alignment of security risk and defenses, this tool also measures the security maturity of your organization. Security maturity refers to the evolution of strong security and maintainable practices. At the low end, few security defenses are employed and actions are reactive. At the high end, established and proven processes allow a company to be more proactive, and respond more efficiently and consistently when needed.

Risk management recommendations are suggested for your environment by taking into consideration existing technology deployment, current security posture, and defense-in-depth strategies. Suggestions are designed to move you along a path toward recognized best practices.
Related Posts with Thumbnails