Wednesday, December 29, 2010

SSAS Engine , SSIS Engine and SSRS Engine

I'm reading: SSAS Engine , SSIS Engine and SSRS EngineTweet this !
Speaking about Microsoft Business Intelligence stack i.e. SSIS / SSAS / SSRS, there are different engines associated with each services. If you are ignorant about these engines, you probably are not fit to design the architecture of your solution using the respective services. The major engines that comes into consideration when you are using MS BI stack are as below:

1) SSIS Runtime Engine - This engine takes care of the administration and execution section of SSIS. In a developer language, I would consider it a Control Flow + SSMS of SSIS.

2) SSIS Data Flow Engine - This engine can be considered as the Buffer Manager of SSIS in-memory architecture.

3) SSAS Formula Engine - The engine takes care of resolving the retrieval of members on any axis of an MDX query. Tuning the performance of this engine has much to do with MDX tuning.

4) SSAS Storage Engine - This engine can be considered as the Data Manager of SSAS, which decides what data needs to be fetched from where. If you trouble Formula Engine, there is a good possibility that this would cascade to Storage Engine, which directly deals with aggregations.

5) SSRS Service Endpoint - This cannot be technically considered as an engine, as most people would argue that rendering / authentication / processing are engines, but I consider these as extensions rather than engines. This endpoint takes care of the administration part of SSRS. Anything that you can do with Reports Manager is a virtue of this endpoint.

6) SSRS Execution Endpoint - This is the endpoint that one would like to award the medal of being an engine. This endpoint takes care of executing the report right from processing the RDL till rendering the report.

You can read more about each of these in MSDN as well as different books and blogs. But until you thoroughly understand the function of these engines and you are designing the architecture, I am of the opinion that one should not feel confident about the architecture design.

Tuesday, December 28, 2010

SSIS Architecture Design Document

I'm reading: SSIS Architecture Design DocumentTweet this !
When I was quite junior in the IT industry, I use to wonder how one designs an architecture document. I used to collect templates of these documents from different sources. Today when I have matured enough to design one myself with my own sense and understanding of system, I feel that a basic idea of the sections that the SSIS architecture design document should contain would be of interest to all those minds who are growing up as future architects.

Delta Detection - This section would contain details about incremental load and Change Data Capture.

Extraction & Staging - This section would contain details regarding the treatment of extracted data and whether permanent / temporary staging are being used. If a staging area is used, it would require more elaboration about the same.

Facades - Interfaces and contracts are a vital part of any architecture design. This section would contain the views / SPs that you would create in your OLTP system. Even your CDC SPs can act as your facade. Also this section would describe whether you are using a push / pull model.

Trigger mechanism - Many solutions contain a application which triggers the ETL cycle. Whether you would use any scheduler like Autosys / SQL Agent should be described here.

Process Control Structure - Each category of packages classified by functionality should log which package was doing what and when. This would provide monitoring layer and control over the entire ETL execution.

Environment configuration - All details about environment variables, package configurations, and the storage locations of those configurations should be described in this section.

Tracability - Logging details should be mentioned in this section. Auditing can be of huge importance to support teams, so this section should be documented and designed keeping support teams in consideration.

Transaction Handling - Details about transaction handling within and across packages belongs to this section.

Error Handling - This section is very important to almost all teams screening your architecture document. Also is your error handling is weak, your transaction handling might suffer and this can be catastrophic to any kind of data loading.

Automation - Particularly in data warehousing projects, automation is one of the regular requirements. Trigger mechanism falls within the automation umbrella upto an extent.

Scalability - The more you loosely couple your package structure to take advantage of parallelism and more you design your logic keeping memory (synchronous and asynchronous transformation)in consideration, the more you gain in scalability. These details should be mentioned in this section.

Hardware Configuration - Server configuration details right from the number of cores till the amount of RAM that you intend to use belongs to this section.

Change Control & Deployment - Deployment methodology and location should be elaborated in this section. Though change control is a part of the configuration exercise, it's linked with how you would manage deployment. So it would make more sense to elaborate Change Control methodology in this section.

Feel free to add your comments to make this section more complete.

Monday, December 20, 2010

Tool to create / support BUS architecture based data warehouse design

I'm reading: Tool to create / support BUS architecture based data warehouse designTweet this !
Whatsoever powerful SSAS may be, when it comes to starting a fresh new dimensional modeling exercise, using SSAS is the last step in the process i.e. data warehouse implementation. Dimensional modeling starts with the understanding of how the clients want to analyse their business, which implicitly involves identifying the ER of the targeted business models. Right from there, one needs to develop a BUS matrix (provided you are following kimball methodology and BUS architecture) followed by a Data Map / Data Dictionary.

Once you have the blue-print ready, artifacts required to build the anatomy of the data warehouse needs to be built, and two of the major ones are:
1) ETL routines to shape your data compliant to Data Mart design.
2) Relational Data warehouse / Data Mart i.e. dimension and fact tables and other database objects that would hold your data transformed by ETL.

The process sounds quite crystal clear, but when you are developing from scratch, and when your data warehouse and dimensional modeling is in the phase of evolution, there is one tool which can be very instrumental in designing the same. The wonderful part is that this tool / template comes for free from the courtesy of kimball group, and it's called Dimensional Modeling Spreadsheet.

Dimensional Modeling Spreadsheet: This template spreadsheet can help you to create your entire data dictionary / data map for your dimensional model, and it contains samples for some of the basic dimensions used in almost any dimensional model. The unique thing about this spreadsheet is that once you have keyed in your design, it has the option to create SQL out of your model. You can use this SQL Script in your database and create the dimension and fact tables right out of it, which means that your data mart / relational data warehouse is ready to store the data. Also this spreadsheet can form the base for your ETL routines. The only other tool in my knowledge which can serve near to this functionality is Wherescape RED, and of course it's not free, as it serves a lot more than just this.

You can read more about this spreadsheet in the book "The Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset". For those who are fresh to dimensional modeling concepts, read this
article to gain a basic idea of the life-cycle.

Sunday, December 19, 2010

Developing different types of reports in SSRS 2008 R2

I'm reading: Developing different types of reports in SSRS 2008 R2Tweet this !
SSRS 2008 R2 has advanced the reporting functionality to a long way, since the time of 2000. There are many different types of reports that can be created using SSRS and each serves a different functionality, but still each of these can be categorized into a few common groups. Different types of reports that can be created using SSRS are as below:
  • Parameterized reports
  • Linked reports
  • Snapshot reports
  • Cached reports
  • Ad hoc reports
  • Clickthrough reports
  • Drilldown reports
  • Drillthrough reports
  • Subreports

The categories of reports that I see from a development / production aspect are as below:

  • Business Users authored reports: Ad hoc reports and Clickthrough reports can be classified in this category of reports.

  • Performance Boosters: Cached reports and Snapshot reports can be classified in this category of reports.

  • Data / Problem Decomposition: Drill-down and Drill-through reports can be classified in this category of reports. These are probably the reports of maximum value for business users.

  • Business As Usual: Parameterized reports are the most common kind of reports, and can be classified in BAU category.

  • Reusable / Versioned reports: Subreports and Linked reports can be classified in this category. These are probably the most focused category of reports from a development perspective.

For more detail on the types of reports in SSRS 2008 R2, a very detailed explanation can be read on MSDN from here.

Tuesday, December 14, 2010

Zip and Email SSRS reports requires use of SSIS - True or False ?

I'm reading: Zip and Email SSRS reports requires use of SSIS - True or False ?Tweet this !
Recently one of my blog reader posted me a question where the requirement is to zip and email SSRS reports to users. Reader wanted to check with me if this can be done using subscriptions and any kind of delivery method. The two main delivery methods available are file-system and email delivery, and to the best of my knowledge, none of them include zipping of files.

When report files are huge in size, to the magnitude of several MBs, emailing those files is quite a big challenge and also it's not an advisable option. In my views, SSRS team should make an option available of zipping the file to at least any one compressed file format, when the file are delivered using file-system or email delivery methods. In my views, these are some of the very common reporting related requirements that SSRS would have to incorporate to gain acceptance as a serious and enterprise class reporting platform.

The order of this process can be as follows:

1) Create report output stream using subscriptions, and the output can be in requested file format.
2) Zip the output stream to any compressed file format, at least .cab format which windows has been using since the times of it's evolution.
3) Deliver the file using file-system / email delivery methods.

As of now, there are no out-of-box options to directly deliver zipped reports to file share output / email the same. SSIS can be used for the same purpose. In the whitepaper that I have authored, I have explained how to generate SSRS reports in bulk using SSIS and SSRS 2008 R2. With little more coding in the Script task, you can zip the created files and email them if required. Two of the articles that demonstrate how to implement the zipping functionality can be read from here and here.

Wednesday, December 08, 2010

Achieve high availability of cubes using SSAS and SSIS

I'm reading: Achieve high availability of cubes using SSAS and SSISTweet this !
Recently, I was faced with two different questions at two different events and both of questions were directly or indirectly linked to high availability of cubes for querying. Those two questions were:

1) What is the difference between scale out and scale up?
2) How would use ensure 24 x 7 availability of a cube, considering the point that globally users are accessing the cube, and the cube should always remain available for querying?

The answer to the first question is when you need to achieve parallelism for concurrency in querying or processing, you distribute / replicate processing operations and/or data on multiple nodes. Scale up usually means that you increase the capacity of the host to enable the server to cater the incoming load. When the capacities of scaling up ends, scaling out steps in.

The next question was quite interesting, and the challenge was that I was in a situation to instantly think of a design and answer this query. I answered this question correctly, and to my delight, I found this whitepaper which is exactly what I answered. Such moments bring a lot of happiness and confidence that my knowledge has not gone stale and I can continue to provide consulting in MS BI business.

The presentation layer is coupled with SSAS query server. Data is read from relational engine and cube is processed on a separate server, which can be considered another layer altogether. After the cube is processed, query server and processing server are synchronized. For multi-server synchronization, SSIS is used. The below two diagrams demonstrates the same. Entire whitepaper can be read from
here.


Tuesday, December 07, 2010

MS BI RFP - Key differentiators to stay ahead in the league

I'm reading: MS BI RFP - Key differentiators to stay ahead in the leagueTweet this !
Right from the role of an associate architect to the role of an account manager, right from pre-sales to account management, I have evidenced that at every stage, one remains faced with the challenge of providing a key differentiator. This challenge can with be with your competitors or even within different teams competing for excellence. Talking about the Request For Proposals that clients invite from multiple solution providers, there are many factors that act as a differentiator which helps client in their decision making. In my views, RFP is not just science, it's art + science, where you blend a solution such that the client is able to accept, digest and accommodate. Please keep in view that the below points are from the eyes of an architect, and not a typical sales person / project management professional.

1) Robustness, outsourcing, design patterns, certified professionals, company size and more. Sales folks would swear by these words, but in my views these are not the differentiators. These can bring a good level of confidence in the engagement economics of a project, but not in the design of the solution. All reputed solution providers would have more or less same traditional design, same outsourcing pyramid, same quality of staff more or less. I have seen huge multinational clients giving away projects to small companies, considering the philosophy that "It's better to be top valued client of 10th best solution provider, than being 50th valued client of the top ranked solution provider."

2) Outsourcing might provide a cheaper solution development cost, but if the solution design has the effect of raising TCO (Total Cost of Ownership), then clients would definitely shy away from the same. This would work well with support projects, but not in case of a fresh solution design and development. One should always pay attention to the technology roadmaps shared by technology providers like Microsoft, Oracle and others, so that the designed solution is ensured to sustain for an acceptable duration of time with reasonable support.

3) Choice of technology platform and components is very vital. Technology is highly driven by innovation, and if your solution completely lacks innovation, even a weak competitor may nail a big blow with an innovative solution that might appeal the client in a brilliant manner. Cloud and SaaS are two of such areas. When a solution can be provided using MS BI and Amazon EC2, a solution proposed using even SQL Server Standard Edition with the biggest range of outsourcing, would fall flat. An innovative solution using a mix of cloud and in-premise technology components can defy even an extremely low cost traditional in-premise solution, as license costs would be the killer.

4) Fabrication speed of solution design and development can help you quote very aggressive "Time to Market". This can be powered by a very well planned Center Of Excellence practice within an organization. This gives the dual advantage of readiness to deal with any technical challenges and speed of execution. Whenever some client visits a production plant, they are shown the machinery and the production units, as a part of confidence building exercise. Listing out in-house developed components / framework ready to be plugged in the proposed solution can be a very big catalyzer to convince clients of an aggressive time to market proposal.

5) Pluggability. I know there is no such word like this, but what I mean is, how flexible is the solution in terms of to and fro integration with external systems, along with it's reusability. Archiecture should be consciously designed, such that each logical unit can be consumed by any external subscribers with least possible complexity, which is generally buzzed by the term Service Oriented Archiecture.

Different stakeholders of project acquisition and management would have different views to it. But being an architect, one needs to contribute directly or indirectly to pre-sales activities, and the above views can be helpful to save your shoulder from burning, if architecture design is at blame. Solution and architecture might be perfect from a technology perspective, but this is not the perspective from which clients would see at it. There are a lot of factors that contribute to a winning bid, but considering the scope and role of an architect in the RFP, I feel about points would sound fair. If you have more to add to this, please feel free to add your comments or email me to share your views.

Thursday, December 02, 2010

Limitations / Disadvantages of using Calculated Measures / Calculated Members in SSAS

I'm reading: Limitations / Disadvantages of using Calculated Measures / Calculated Members in SSASTweet this !
In my views, Designing is a process that is driven entirely by impact analysis. However trivial a product / tool / technology feature may be, if it's used without thoroughly analyzing the impact it can have on the overall system / solution that has to be developed, this would mean that you just signed up a guaranteed future roadblock.

Calculated measures / members seem like a very straightforward design feature, and whenever you feel need / shortage of some members / measures, calculated ones seems to be a very easily available option from a development perspective. But there is another side of the same, and these points should be kept in consideration before you make the decision of going with calculated members / measures.

1) Drillthrough does not operate on calculated measures or any other calculations that reference calculated measures / calculated members. This means, for example, if you have created a calculated measure on the cube which the user might opt to use as a part of drillthrough dataset, this means that now you are stuck and you need to find a workaround.

2) Calculated measures cannot be secured using Dimension Security in a straight forward manner, in fact they won't be listed at all in the Dimension tab of the role where we define the Dimension security. Also when security is applied on regular members, and due to the same, if they are not available to calculated members, they would fail i.e. when such measures are browsed in client tools like Excel, the value that would be displayed is an error value like #VALUE.

Based on the above two points, calculated measures / members should be scanned against drillthrough and security requirements, so that a trivial overlook in design doesn't translate into a showstopper issue over the period of time at a later stage.
Related Posts with Thumbnails