Tuesday, December 22, 2009
Saturday, December 19, 2009
Thou Microsoft has not directly come out with any product that integrates into the MS BI or its related stack of technologies, but 3rd party vendors have already started exploring the potential of visualization capability of Silverlight. Also to extend the reach of Silverlight content, Microsoft partnered with Novell and created Moonlight which is a plugin that can facilitate Linux users to access Silverlight content.
In June 2007, Dundas announced that it has licensed Microsoft for the usage of its source code for a part of Dundas Advanced Visualizations which included Dundas Charts and Dundas Gauge. And we saw a really professional set of charting and visualization capabilities in SSRS 2008. The first step in the share of Silverlight on the landscape of MS BI is on the delivery end of a BI solution i.e. Dashboarding and it has been developed by Dundas in the form of Dundas Dashboards developed using Silverlight. It's Dashboard visualizations are very impressive and in my viewpoint, it definitely has the potential to outperform the visualization capabilities of Performancepoint Server 2007. Thou Performancepoint Services for Sharepoint has now improvised in this area and seems like now it has digested the codebase of Proclarity to a better extent, but still in my views, the visualizations provided by Dundas Dashboards are quite competitive.
Wednesday, December 09, 2009
Pivoting has made it's journey over the period of time that can evidently be seen in smaller steps. Firstly, it used to be mostly limited to pivot tables that used to exist in Excel. This slowly became one of the key functionality that addicted business users with Microsoft Excel. Office Web Component (OWC) which also contained this functionality, became very popular and started to make it's place in stand-alone and distributed applications.
Sensing the need for the same in database development, Microsoft introduced PIVOT and UNPIVOT operators in T-SQL with SQL Server 2005. This made queries much easier for developers, which used be a lengthy and complex piece of code used to creating resultsets that were typically consumed by some cross-tab reports. SQL Server Analysis Services 2005 (SSAS) cube browsing also got facilitated by using OWC.
Sensing a feature/characteristic of pivoting to aggregate huge information, project Gemini was started which finally resulted into what we know today as PowerPivot. It can aggregate i.e. pivot and analyse huge data from a variety of sources using engine of SSAS and interface of Microsoft Excel and looks very promising with its charting and analysis capabilities. Also it seems like Microsoft plans to go big in this direction, as Microsoft has set up a dedicated lab kind of research setup for pivoting known as Microsoft Livelabs Pivot.
Importance of pivoting is not just recognized by Microsoft, but other industry vendors are also making their move in this direction to get their slice of business. Infragistics has announced release of their Silverlight Data Visualization CTP which consits of two basic controls : OLAP Pivot Grid and Data Chart. OLAP Pivot Grid fetches data from analysis services using ADOMD and visualizations generated by it can be compared to that of Dundas Charts and others.
Tuesday, December 01, 2009
Even to my surprise, there is a huge difference between the two options when this size of data is to be accessed over the network. So when you expect huge resultsets to be returned, you might want to think twice by looking at the results in the below image (Image courtesy: SQLServerPedia.com). It's not that XML option is not good, it has it's own place and set of requirements that it caters. But while designing your architecture, this piece of comparative study is worth considering and it can also prove helpful to carry out a Proof-Of-Concept ( POC ) before considering this option in your low-level designs.
Monday, November 23, 2009
It's not always that one himself / herself needs to be a super expert. Knowledge is always power and there's no doubt in it, but when you work in a team, you can't expect each and every team member having same level of expertise and the same speed of execution of development. So remaining within the boundaries of costs, as a team lead one needs to educate as well as facilitate oneself and team with tools and stuff that can boost up the development process and reduce time in tackling known challenges.
Microsoft Business Intelligence Project Booster Kit is my first attempt to share my experiences about a basic set of tools and documents / documenting methods which I term as "Booster Kit" that can help any development team with stuff they can use in their day to day work to help them gear their project faster directly / indirectly in the shorter / longer term at one / another phase of the project life-cycle.
If you came across my blog through my article on SQLServerCentral.com, please feel free to drop your valuable feedback a comment on this post. Also feel free to reach out to me on firstname.lastname@example.org.
Thursday, November 19, 2009
Download instructions and links to download locations for different flavours of Powerpivot can be accessed from here.
Those who are not able to download and install Powerpivot (as it also requires installation of Office 2010 beta in one or another way), need not to get disappointed. A Virtual Lab of Powerpivot for Excel 2010 Introduction is available from Microsoft. Aspirants can use this lab, and get their hands-on this lab to get the feel of Powerpivot without bothering about any download or installation.
This Virtual Lab can be accessed from here.
Wednesday, November 18, 2009
Let's start one-by-one.
Sharepoint Server 2010 contains Performancepoint Services for Sharepoint, which is the performance measurement or dashboarding solution candidate from Microsoft. Also I anticipate that it will also contain Powerpivot Services for Sharepoint (PSS).
Sharepoint Server 2010 has the below mentioned preliminary requirements:
- SharePoint Server 2010 will be 64-bit only.
- SharePoint Server 2010 will require 64-bit Windows Server 2008 or 64-bit Windows Server 2008 R2.
- SharePoint Server 2010 will require 64-bit SQL Server 2008 or 64-bit SQL Server 2005.
From the above set of requirements it's very easy to make out that if one wants to have Performancepoint Services and/or Powerpivot Services for Sharepoint or even just Sharepoint Server 2010 for hosting of it's dashboarding solution, the entire landscape of Microsoft BI line of products used in the solution including the Operating Systems needs to be 64-bit. Also the hardware needs to be 64-bit compliant.
If the above is not enough, there's still more left. I recently came to know from one of the threads on SSIS forum that Master Data Services, which is Microsoft's new offering in SQL Server 2008 R2 (and is expected to be released in the coming quarter) for Master Data Management is going to be 64-bit only.
So for aspirants who are keen on getting their hands-on the new products are left with fewer options to try out all these new features, as most of us do not have 64-bit operating systems or hardware within our accessibility. The only hope is that, if Microsoft releases a new Microsoft Business Intelligence VPC that comes loaded with all these softwares, like the previous ones.
Friday, November 13, 2009
The functions listed in the Time-Intelligence section looks very much aligned towards the structure and usage that we find in the Date / Time dimension. I wonder why all this information is not available on MSDN yet, or at least I am not able to locate down on MSDN.
Thursday, November 12, 2009
Below is the summary of what I was able to extract out of the article, that I felt of interest:
- PowerPivot processing engine is called VertiPaq
- VertiPaq engine uses AMO and ADOMD.Net for internal processing
- Powerpivot add-in sends requests to this engine using different transport protocols depending upon provider. Transports like HTTP & TCP/IP are supported.
- Powerpivot add-in is developed using C#.Net and other managed libraries of the .NET Framework. .NET Folks can be proud now as they have reserved a seat on this space.
- All the components of the Powerpiovt architecture i.e. Excel, Powerpivot, AMO and ADOMD.Net are implemented and works in-process. This means crashing of any of the component involved in the architecture would crash all the components. Vertipaq crash is an excel crash makes sense to me, but Excel crash is Vertipaq crash is hard for me to digest. As of now, I am not sure if this is a mole or mountain sized limitation, but for sure this is a limitation.
- Powerpivot System Service (PSS) is probably the Sharepoint version of Powerpivot implementation. Pairing of this service with Performancepoint services, would make Sharepoint a big player of Microsoft BI implementation toolset. This definitely has the potential to bring Sharepoint in the league of SSMS and BIDS, or some would probably argue that Sharepoint already is in this league.
Use this link to read the original article.
Wednesday, November 11, 2009
This task wraps a nice functionality of generating report output behind the scenes, without requiring any programming knowledge for the same. It has a nice User Interface available, with option to view the report during task execution. All the options are self-relevant and the author has provided a nice example of how to use the task on the home page.
This component collects all the data from the options selected by the user, and sets it's properties. Value of these properties are in turn passed on to the object of ReportService2005 Endpoint. Finally this object is passed as a parameter to ReportExecution2005 Endpoint and the report is generated programatically using the WebService.
Those who are novice to the concept of webservice and end-points in the context of reporting services, start from this link to learn about the same. After the same, the next step can be to learn more about Report Server Web Service end-points and how to create a client for the same. Finally once you master the same, the source code of this task is the final lesson to learn the best way to implement it. The source code of this task is available for download on codeplex.
SyncToy 2.1 is a free application that synchronizes files and folders between locations. Typical uses include sharing files, with other computers and creating backup copies of files and folders.
In project environments, where data feeds are received in the form of files, in a development environment such tools are quite helpful. Also for general content management, having one such tool which is also FREE is a big boon. Below in an excerpt from the download page:
In some cases it is necessary to regularly get copies of files from another location to add to primary location; in other cases there is a need to keep two storage locations exactly in sync. Some users manage files manually, dragging and dropping from one place to another and keeping track of whether the locations are synchronized in their heads. Other users may use two or more applications to provide this functionality.
Now there is an easier way. SyncToy, a free PowerToy for Microsoft Windows, is an easy to use, highly customizable program that helps users to do the heavy lifting involved with the copying, moving, and synchronization of different directories. Most common operations can be performed with just a few clicks of the mouse, and additional customization is available without additional complexity. SyncToy can manage multiple sets of folders at the same time; it can combine files from two folders in one case, and mimic renames and deletes in another case. Unlike other applications, SyncToy actually keeps track of renames to files and will make sure those changes get carried over to the synchronized folder.
Monday, November 09, 2009
New Features in Performancepoint Services 2010 , Difference between Performancepoint Server 2007 and Performancepoint Services 2010I'm reading: New Features in Performancepoint Services 2010 , Difference between Performancepoint Server 2007 and Performancepoint Services 2010Tweet this !
With the advent of Performancepoint Services in Sharepoint Server 2010, it seems like there are lot of new improvements from a user interface and designing point of view. One of the feature that was very strange was that, the graph that was available had only two colours, and there was no option to add another colours to it. We struggled quite a lot to find out a solution for the same but without any success.
Looking at the new features that are available in Performancepoint Services, the graph control looks much more matured now with the Pie-chart that was missing in the earlier version. Also there are new visualization available for the representation of data, more detailed representation of KPIs and many more. It seems like now Performancepoint has digested Proclarity to a great extent.
For a complete list of new features in performancepoint services 2010, check out this article.
Sunday, November 08, 2009
PerformancePoint Monitoring and Analytics: Dashboard basics I
PerformancePoint Monitoring and Analytics: Dashboard basics II
Friday, November 06, 2009
What is Excel Services ?
Excel Services 2007 shipped in Microsoft Office SharePoint Server 2007 as part of the Enterprise CAL. Excel Services 2010 provides real-time, interactive, Excel-based reporting and dashboard capabilities which ship as part of SharePoint Server 2010. Also it includes APIs which enable rich business application development. One such API is Representational state transfer ( REST ) API.
What is REST API ?
A RESTful web service (also called a RESTful web API) is a simple web service implemented using HTTP and the principles of REST. Such a web service can be thought about as a collection of resources. The definition of such a web service can be thought of as comprising three aspects:
- The base URI for the web service
- The MIME type of the data supported by the web service. This is often JSON, XML or YAML but can be any other valid MIME type.
- The set of operations supported by the web service using HTTP methods (e.g., POST, GET, PUT or DELETE).
More about REST can be read on Wikipedia.
What is Excel Services 2010 REST API ?
The Excel Services 2010 REST API is a new programmability framework that allows for easy discovery of and access to data and objects within a spreadsheet. The data, including charts, that is returned by the REST API is not static – it’s live and up-to-date.
With the REST API, any changes in the workbook are reflected in the data that is returned. This includes the latest edits made to the workbook, functions that have recalculated (including User Defined Functions), and external data that is refreshed. The REST API can also push values into the workbook, recalculate based on those changes, and return the range or chart you requested after the effects of the change have been calculated.
By crafting the proper URI, the REST API allows you to:
- Discover the items that exist in a workbook, such as tables, charts and named ranges
- Retrieve the actual items in the workbook in one of the following formats: Image, HTML, ATOM feed, Excel workbook
- Set values in the workbook and recalculate the workbook before retrieving items
The content above is a re-draft of the original article on Microsoft Excel Blog, and it has been re-drafted for the sake of simplicity of understanding.
Thursday, November 05, 2009
Agile Microsoft Business Intelligence Implementation - My next presentation at London SQL Server Users GroupI'm reading: Agile Microsoft Business Intelligence Implementation - My next presentation at London SQL Server Users GroupTweet this !
MS BI Project Booster is a kind of kit that consists of a collection of tool & utilities (freewares) and douments and/or documenting methods, that is good to have installed or available in advance, that can help any MS BI Team to tackle issues or cater requirements in a speedier manner at various stages of a project development life cycle. It's a resource kit, and can be used at various stages of the project and by the use of the same, I believe that a project can have the benefit of an Agile BI Development from an implementation perspective.
I have also submitted an article on http://SQLServerCentral.com on the same topic, so I would not publish any write-up on the same topic. Soon the article should be published and available for public reading.
Those of you who would like to make to the event, please register your seat. Entire details about the agenda, speakers and registration can be accessed on this URL:
The content in this book is a copy of selected content in the Office SharePoint Server technical library (http://go.microsoft.com/fwlink/?LinkId=84739) as of the date above. For the most current content, see the technical library on the Web.
Download this book as a Microsoft Word document (.docx) (http://go.microsoft.com/fwlink/?LinkId=168786).
Download this book as a PDF file (http://go.microsoft.com/fwlink/?LinkId=168793).
Sunday, November 01, 2009
SSRS / BIDS 2008 Reports cannot be consumed by report rendering control in SSMS 2008, but only SSRS / BIDS 2005 ReportsI'm reading: SSRS / BIDS 2008 Reports cannot be consumed by report rendering control in SSMS 2008, but only SSRS / BIDS 2005 ReportsTweet this !
Develop any regular report using BIDS 2008 and SSRS 2008. If you use the same report for rendering in SSMS 2008 by using the Custom Reports functionality, the report won't get rendered however simple it would be. I got confused with this behaviour, and started investigating this issue over web. The error that I was getting continuously was "The report definition is not valid. Details: The report definition has an invalid target namespace 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition".
After spending quite some time, I was able to find an issue on Connect that this is a bug with SSMS. The bug definition is as follows:
When I try to run a custom report in SSMS created with BIDS 2008 it fails with the following error message:
The report definition is not valid. Details: The report definition has an invalid target namespace 'http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition' which cannot be upgraded. (Microsoft.ReportViewer.Common)
As of now this bug has no workaround, and the only possible solution is to install BIDS 2005 and even probably SSRS 2005 to develop a report (or if someone has a tool to work with RDL 2005 specifications) side by side to SQL Server 2008. I can't believe, all this to just develop a simple SSRS 2005 report which can be consumed by dumb report control in SSMS 2008.
Thursday, October 29, 2009
As documented on the product site, "Reporting Services Scripter is a .NET Windows Forms application that enables scripting and transfer of all Microsoft SQL Server Reporting Services catalog items to aid in transferring them from one server to another. It can also be used to easily move items on mass from one Reporting Services folder to another on the same server. Depending on the scripting options chosen, Reporting Services Scripter can also transfer all catalog item properties such as Descriptions, History options, Execution options (including report specific and shared schedules), Subscriptions (normal and data driven) and server side report parameters."
It has a very nice GUI and the options are pretty self-relevant. There are a few steps that one needs to follow to create a setup out of this utility for reports deployment.
Step 1: Configure paths for the below
a) Point to correct path of RS.exe location on the Globals Tab
b) Target report server url where you intend to deploy your SSRS Reports
c) SSRS Solution or Project file under which all the SSRS reports are added or select individual reports that you would want to target. Also you can click on the Get Catalog button to find out the reports on the server that you have configured.
d) Output location where you want the setup files to be created
e) Select Generate Report Script on the Report Tab
f) Select Specify New Parent Folder / Create Folder Hierarchy option from the General Tab as per your requirement.
Step 2: Click the Script button to create setup files
Step 3: Execute the setup batch file created, and this would deploy all the reports.
Behind the scenes, the batch file executes the VB Script files created for each SSRS Report. This script files contains the code, that creates a corresponding object of SSRS Reports model, and uses the RS.exe utility with these reports as parameters. This utility sets all the properties of these objects that are collected thru the user interface.
If still there is someone left, who is a fan of coding, all the information on how to create scripts using VBScript to deploy each different object of SSRS can be found on this page.
Thursday, October 22, 2009
It's said that "A Picture is worth a thousand words" and I say that "A Dashboard is worth a thousand queries". You can get all the definitions and pictures of a Dashboard from Wikipedia or some books. But if you want to get a real feel of how a dashboard looks like, there are two great places where one can take lessons from:
1) MS BI VPC: This one is a good example for BI developers, as it comes with all the great examples of everything that can be developed using Microsoft BI Stack of Technologies. It also includes a dashboard for an AdventureWorks cycle, that is created using Performancepoint Monitoring module. This VPC comes loaded with an evaluation version of all the softwares starting SQL Server and MS Excel to Performancepoint which has one or another feature of Business Intelligence.
2) Realtime Example: This one is for best ways to learn of how a dashboard would look like and how it would help management to take strategic decisions based on the information presented in a form of decision support system. Someone who is fresh might not be able to quickly dive into BI tools, but there's a very nice site where one can see a Dashboard in live action.
This website is called the Federal IT Dashboard. It a magnificient website and the Dashboard presentation is majestic. This website is probably owned by the US government to publish to the public, of their IT spending on different portfolios. Even if you are a seasoned BI Developer or a Novice BI aspirant, I recommend taking a look at this website to get a real feel of how Dashboard can make a difference and how an interactive dashboard can look like.
Tuesday, October 20, 2009
Below in an excerpt from the post where they declared the same:
Historically, business intelligence has been a specialized toolset used by a small set of users with little ad-hoc interactivity. Our approach is to unlock data and enable collaboration on the analysis to help everyone in the organization get richer insights. Excel Services is one of the popular features of SharePoint 2007 as people like the ease of creating models in Excel and publishing them to server for broad access while maintaining central control and one version of the truth. We are expanding on this SharePoint 2010 with new visualization, navigation and BI features. The top five investment areas:
1. Excel Services – Excel rendering and interactivity in SharePoint gets better with richer pivoting, slicing and visualizations like heatmaps and sparklines. New REST support makes it easier to add server-based calculations and charts to web pages and mash-ups.
2. Performance Point Services – We enhanced scorecards, dashboard, key performance indicator and navigation features such as decomposition trees in SharePoint Server 2010 for the most sophisticated BI portals.
3. SQL Server – The SharePoint and SQL Server teams have worked together so SQL Server capabilities like Analysis Services and Reporting Services are easier to access from within SharePoint and Excel. We are exposing these interfaces and working with other BI vendors so they can plug in their solutions as well.
4. “Gemini” – “Gemini” is the name for a powerful new in memory database technology that lets Excel and Excel Services users navigate massive amounts of information without having to create or edit an OLAP cube. Imagine an Excel spreadsheet rendered (in the client or browser) with 100 million rows and you get the idea. Today at the SharePoint Conference, we announced the official name for “Gemini” is SQL Server PowerPivot for Excel and SharePoint.
5. Visio Services – As with Excel, users love the flexibility of creating rich diagrams in Visio. In 2010, we have added web rendering with interactivity and data binding including mashups from SharePoint with support for rendering Visio diagrams in a browser. We also added SharePoint workflow design support in Visio.
Monday, October 19, 2009
Data Analysis Tool / Add-In to extract and develop Business Intelligence using Excel and SQL Server : PowerPivotI'm reading: Data Analysis Tool / Add-In to extract and develop Business Intelligence using Excel and SQL Server : PowerPivotTweet this !
PowerPivot for Excel 2010 is a data analysis tool that delivers unmatched computational power directly within the application users already know and love—Microsoft Excel. It provides users with the ability to analyze mass quantities of data and IT departments with the capability to monitor and manage how users collaborate by integrating seamlessly with Microsoft SharePoint Server 2010 and Microsoft SQL Server 2008 R2.
BI Offerings from PowerPivot:
Give users the best data analysis tool available: Build on the familiarity of Excel to accelerate user adoption. Expand the existing capabilities with column-based compression and in-memory Bi engine, virtually unlimited data sources, and new Data Analysis Expressions (DAX) in familiar formula syntax.
Facilitate knowledge sharing and collaboration on user-generated BI solutions: Deploy SharePoint 2010 to provide the collaboration foundation with all essential capabilities, including security, workflows, version control, and Excel Services. Install SQL Server 2008 R2 to enable support for ad-hoc BI applications in SharePoint, including automatic data refresh, data processing with the same performance as in Excel, and the PowerPivot Management Dashboard. Your users can then access PowerPivot workbooks in the browser without having to download workbooks and data to every workstation.
Increase BI management efficiency: Use the PowerPivot Management Dashboard to manage performance, availability, and quality of service. Discover mission-critical applications and ensure that proper resources are allocated.
Provide reliable access to trustworthy data: Take advantage of SQL Server Reporting Services data feeds to encapsulate enterprise systems and reuse shared PowerPivot workbooks as data sources in new analyses.
Wednesday, October 14, 2009
This article demonstrates the process of debugging performance issues of SSRS reports at the client end. When everything seems okay right from the execution of stored procedures related to the report, to network and report server configuration, one is required to look into what's going over the wire i.e. what's going out and coming in from the client network. This issue is not specific to SSRS reporting. In any web-based server side application development, when the web application is to be debugged for performance issues or even when it needs to be benchmarked for certain parameters, this method is used. And the method is placing an HTTP Proxy or HTTP Sniffer.
This tool can be installed on any of the client machines, and any HTTP activity effective due to report execution on the client machine and network can be logged and investigated. The most promising features of this tool are, firstly its a freeware. This tool comes with several add-ons which are very much suited for analyzing the content and requests that flow over the wire, with all the minute details that can possibly be captured. It's a very nice tool for performance debugging and in my viewpoint, BIDS should have shipped with this kind of tool. This tool is developed using Microsoft .NET and also provides the flexibility to extend the code.
Thumbs up for this tool and the article which itself is a great tutorial explaining the methodology of performance debugging of SSRS reports from a client network.
Reference: Tom Mills Blog
Tuesday, October 13, 2009
Less often I have seen Data source Views (DSV) getting used in an SSIS solution. The biggest trait of DSV is that, once its defined, it doesn't get refreshed by itself. In a way it is desireable, and in a way it is not. This is harmful in the way that, if the underlying source structure has changed, there would be no notification to the package of the same, until and unless you explicitly hit the refresh button. But once refreshed, it catches all the changes from the underlying data source and updates the view.
The main advantage of using a DSV in a SSIS solution would be centralizing the connection logic and the scope of database objects to be used. All the packages in the solution can use the same DSV and connect to objects limited to the scope of the DSV. In case of change in logic, it would get easily cascaded to the respective packages using this DSV. Also one DSV would always run under one connection object, which would mean that by enforcing the use of DSV, uncontrolled creation of private database connections would be implicitly controlled. Also at times, it help in continued package development even if Server is offline or disconnected.
And if one needs to connect to some database objects that one would not wish to include in the DSV, a connection can always be created using the Shared Data Source created for the DSV. This would be a direct connection to the database, without any kind of filtering of database objects that a DSV would have. All in all, DSV is a good practice to centralize and control connection usage in a SSIS Solution.
Tuesday, October 06, 2009
2) When you change an existing formula, the data-type is automatically changed by this component to Unicode until and unless it is specifically casted to ANSI. And this sudden change can break components ahead in the flow like Sort for example, which detects any metadata changes. A better solution can be to edit the formula in the Advanaced Editor to get rid of this buggy behaviour of this transform. This specially is the case in SSIS 2008.
3) While developing a formula, many get confused with the return types. Make sure that across the formula, return type should be of the same data type irrespective of the condition in the formula. Even if the intention is to return NULL, just cast the NULL to specific data type that the other part of the logic is returning.
Sunday, October 04, 2009
When is Data Cleansing required ?
Generally Data Cleansing is required in a systems integration flavor of project. For example when different systems which might be logically related like back-office and front-office systems, or non-related systems like different business application that might be independent in its own respect but functioning under a common umbrella. In both these examples one thing is common, these systems operate differently and are subject to integration.
What is Data Cleansing ?
Data is the above mentioned systems are not always as expected, or the data is not of the quality that is required to be fed into new system. For example, there can be OLTP applications or front-office systems where data is not fed in the precise manner due to operator error or limited system design. So these data issues needs to be fixed, mostly in the host or intermediate system which is mostly effective to the requirements of the new system where data of a particular quality level is required.
How to do Data Cleansing, using SQL Server in specific ?
There are different ways of doing Data Cleansing, and there is not particular theory or formula to it, in the best of my knowledge. But one thing that applies universally to any data cleansing exercise is, the data should be profiled thoroughly. Thorough data profiling is directly proportional to level of data cleansing. Mostly the hard part is data profiling, than data cleansing. It takes more business knowledge combined with technical analysis of the data for profiling. Once the details about what to fix in the data is catalogued, it's generally not that hard to fix those issues using ETL tools like SSIS. SSIS comes with a dedicated task for Data Profiling and a dedicated tool for viewing the profiled data.
Monday, September 28, 2009
All the different ways of importing data, with the caveats of each of the methods are listed on an article that I found on lessthanadot.com. I had the opportunity to place the issue of date conversion with Import Export wizard to the database professionals, and this post is one of the best places where this issue can get it's deserved attention. This article would be of most use specifically in a SSIS Data migration project, where you need to put all the options on the table for importing data from source to SQL Server using SSIS as a part of migration strategy.
It's a useful post and can come handy while thinking of options in the way one would want to import data during the technical design planning of any solution. The article also provides the statistics of the test results for various volumes of data with all the methodologies followed for importing the data. If the author would had specified the hardware configuration used and with a little bit change in proforma, it would had been a nice tiny whitepaper in itself. Still nice article worth reading along with comments.
Thursday, September 24, 2009
I feel that while designing of Import Export wizard, equal importance has not been given to each data types. For example, there are properties to adjust the precision and scale when numeric data types are selected. But when datetime data type is selected, there should be option to specify the style to use while making the conversion during the import process. This feature is absent and while making text file imports for example, the dates would be imported and converted to mm/dd/yyyy format i.e. US Standard (provided the installation language is US English). But if the source data contains dates in dd/mm/yyyy format i.e. British Standard, there is no means to specify within the wizard, what format or standard should be used for conversion. This can be a great shortcoming at times.
The workaround can be use another tool, or import the same as Varchar column, and convert is using the Convert system function before using the field. I know that it's not a great solution, but because of the limitation that exists on the date format conversion using this wizard, it becomes a SISO show (Shit In Shit Out). Still it's not that bad and the solution is workable. I hope that Microsoft realizes this shortcoming and adds this feature in a service pack, instead of the next release.
BTW, if anyone is aware of a better solution or would like to correct me on this, I would be glad to learn the same. I might not be completely updated in the usage of SQL Server Tools and it can be my ignorance, but to the best of my knowledge, this shortcoming exists in SQL Server 2008 Import Export Wizard.
Wednesday, September 23, 2009
Free Ebooks , Study Guide and Interview Questions on SQL Server Security, Performance and AdministrationI'm reading: Free Ebooks , Study Guide and Interview Questions on SQL Server Security, Performance and AdministrationTweet this !
Ebook on SQL Server Execution Plans (Dissecting SQL Server Execution Plans.pdf)
Takeaway: In depth technical explanation on SQL Server Query plan execution
Ebook on SQL Server XML Schema Collections (The Art of XSD.pdf)
Takeaway: Technical content to master your skills on usage on XML and XSD with SQL Server
Ebook on SQL Server Interview Questions (SSC Stumper Vol 5.pdf)
Ebook on SQL Server Security and Encryption (Protecting SQL Server Data.pdf)
Tuesday, September 22, 2009
This marketing strategy is keeping this company closer to the SQL Server community of professionals and also it shows how well this company has reach and insight into the product. Though I have not used many of SQL Server tools, as they are more concentrated on a DBA or Developer level of profile, and I work more onto the Business Intelligence side of SQL Server, but I am confident that they are one of the healthy market competitors.
Below is one of the EBook Bundle available for free from RedGate.
Free Ebook on DBA Best Practices (DBA_Best_Practices_ebook.pdf)
Free Ebook on SQL Server 2008 New Features (Brads Sure Guide to SQL Server 2008.pdf)
Free Ebook on Mastering SQL Server 2005 or SQL Server 2008 - Profiler Tool (Mastering Profiler eBook.pdf)
Free Ebook on How to be an Exceptional DBA by Brad McGehee (Exceptional DBA 2ndEd.pdf)
Free Ebook on SQL Server Tools and How to do with SQL Server (SQL_Server_Tacklebox_Ebook.pdf)
Download Courtsey: www.red-gate.com
Thursday, September 17, 2009
HP has come out with a new set of whitepapers and a tool which they claim can help in making recommendations for the hardware configuration required for Data warehousing and ETL. Based on the features that the tool claims to provide, it seems to be nice tool and can really provide informed recommendations about the hardware configuration, and the greatest point is that the tool comes from the experts of hardware. On the product page of this tool, they claim that the tool would model configurations for:
1. ETL (Extract, Transformation, and Load) layer using SQL Server Integration Services
Following downloads are available from the HP site:
HP Business Intelligence Sizer for Microsoft SQL Server 2005/2008
HP Whitepapers on SQL Server 2008 Data Warehousing / Business Intelligence
Reference: Benjami Wright Jones Blog
Monday, September 14, 2009
Monday, September 07, 2009
1) C2 Audit compliance: It's a US standard of compliance, and is quite strict in terms of its requirements. Everything that is executed on SQL Server is audited and written down to a trace file. And the not so pleasant part is, for whatsoever reason if SQL Server is not able to write to a trace, SQL Server service would be stopped. Once this auditing standard is enabled, the trace recording gets triggered and the file is saved in default directory. It can be enabled just by using sp_configure stored procedure.
2) Common Criteria compliance: This is an European standard of compliance and is considered to be a superset of C2 audit compliance. It is more flexible or I should use the word more free form in terms of the requirements it lays down for its compliance. A very nice article on this compliance can be found on SSQA.net.
3) PCI compliance: PCI is carried out on projects in almost any CMM level 5 organisation that practices Quality Assurance on projects genuinely. SQL Server has support for this too. A nice webcast and whitepaper on the same are available that guides how and what of PCI compliance with SQL Server 2008.
Those who need more information on compliance can download the SQL Server 2008 Compliance Guide for detailed information.
Friday, August 28, 2009
But I was really surprised, when I found that it's not the case. Author has used a Visual J# library (vjslib.dll) to read the zip file, which ships as a part of Visual J# Redistributable Package. Before a couple of years, I had worked for a few weeks in Java to help out a project and from that experience I knew that Java Runtime Engine probably comes with this kind of libraries to read .jar files.
It's a brilliant discovery, and serves a great purpose. By this functionality, one can download archived data from web, unzip it and process it to it's destination. Downloading data using a web-service and processing is very much possible using SSIS, but the missing bridge was unzipping files using SSIS without using 3rd party components. And now the bridge is ready !!
The library comes for free, and the author has published a nice step by step tutorial describing the implementation of the same. The entire article can be read on this link.
Thursday, August 27, 2009
Articles on the Kimball University website or other related websites like Intelligent Enterprise for example are some of the best sources who wish to start or improvise their knowledge on dimensional modeling. This even applies to me. I believe it's a matter of experience, by which your skill improvises in this area, not by just reading a book or webcast.
In dimensional modeling, generally we work on at least some of the common entities which generally exist in any regular business application, for example : Materials, Customers, Employees, Payroll, HR etc.
There's a good article published recently which can server as a good tutorial on how to architect dimensional model for Employee by Kimball University on IntelligentEnterprise.com. I believe that this article presumes some level of data warehousing knowledge and understanding from the reader.
I suggest, if one wants to take the first step on dimensional modeling, then one should read this article. Thereafter, being an MS BI professional, I would implement the same using SQL Server 2008 and SSAS, populate the relational data warehouse and cube and attempt experimenting different permutations and combinations in the model to experience the various technicalities of dimensional modeling.
Tuesday, August 25, 2009
1) It has a very very strong integration with Microsoft Excel and it's the primary runway from where Gemini takes it's flight to the Business Intelligence space. It is also heard to have nice integration with Sharepoint and other report authoring or rendering tools.
2) It's heard to be an in-memory storage form of Analysis services, i.e. it can be said as an another compound of analysis services. In fact data stored in Excel via Gemini add-in for excel is stored in the form of a .AS database, which means that it can be consumed by any applications which can consume data from analysis services. I may not be precise in the above statement, but the summary is data projected by gemini to its host applications can be or rather would be in the form of an analysis services database.
3) Gemini add-in for Excel is one of the fantastic ways for business users to get the most out of Excel, still having the capabilities of the likes of any data warehouse analysis hosted in a dashboard hosting environment like performance point. Here Gemini is the analysis tool, Excel can store the data in place of data warehouse and Excel charts / graphs act as the dashboard or visualization holders. Also Gemini add-in makes use of Silverlight for it's visualizations.
4) This tool as heard and discussed by industry experts, is a Microsoft's competition to tools like QlikView in the self service or I would like to use the word "portable" business intelligence arena for business (non technical) users. The advantage that gemini has, is the immense propagation of Microsoft Excel in the industry with its added popularity among business users, and its tight integration with the same.
5) Below is a link of few videos which provides preview of project Gemini:
Tuesday, August 18, 2009
1) An SSIS package designed for IDocs would be much simpler to understand and design compared to the same for Screen recorded sessions.
2) Always sort the data after reading (or you can also use the IsSorted property setting) and sort the data before you write it down to the file output.
3) Audit the data to the necessary and sufficient level. More auditing would break a developers shoulder and less would break the functional / technical SAP consultant's shoulders.
4) Never let your main package read the data directly from file input. Stage the data into a relational storage like a table. Also never read directly from the table. Always keep an interface like a stored procedure or a view from which your package should read. This design would have the advantage of more resistance towards changes, which is the theme of any such project.
5) Always keep the main control in the database which designing the architecture, for eg. configuration settings, loop iterator collections, file paths etc.
6) Generally data objects like Materials, Customers, Vendors, Finance, Payroll, Employee, etc... are the kind of data objects that gets designed in the SAP system. The first three are the foundation stones to the best of my SAP knowledge till date.
7) Try to use less variety of data types, and the rule of best fit data type does not apply to data migration project. More the synchronization you have at your relational staging area, the less pain you would have while designing your package.
8) Be well versed with casting operators and functions, as generally you would require to use them a lot in such projects.
9) Import - Export wizard can be a nuisance if the text file from which you are importing the data have duplicate column names. If you are working with text files as input source files, make sure you do a basic minimum profiling of the source files.
There are many more points worth mentioning, but for now this is just a quick starter.
Monday, August 10, 2009
If there are many errors on the package due to incomplete configuration of the controls, there would be many errors reported on the error list window. This error list keeps on getting updated at every change in the package. For example, there are 6-10 flat file destinations which are not configured with the input stream and placed on the package for future use. There would be dozens of errors reported on the package, which can be seen from the errors window. Then suppose if a sort transformation is placed on the package, BIDS will validate the entire package again and the entire error list would be refreshed again. This makes sense as BIDS is trying to find out if the change in package has impacted or rectified any errors present on the package.
So in essence, the time taken is more. And the time taken is for the entire process and not just placing of control on the package, though it might seem that just placing of control on the package is taking a lot of time.
The resolution is, disable all the control flow elements that have errors due to logical reasons or due to incomplete configuration. Unfortunately as there is no disable functionality in the data flow, remove all the tasks that have an error due to known reasons. Save the package, and try to execute it again. Now the package designer in BIDS should run much faster than what was the case earlier, provided you have quite a few erroneous tasks on the package.
Friday, August 07, 2009
Coming to the point, there is new tool out called Entity Developer Express. This is a tool not exactly for MS BI or SQL Server development or maintenance. But I am listing it as it's foundations are laid on SQL Server. While working on projects that have DotNet, SQL Server and/or Microsoft Business Intelligence features involved, this kind of tools makes a developer life easy.
I have neither evaluated this tool, nor I am going to do it as presently I am working on system integration project involving SAP and SSIS. So to be honest, no time and less interest to evaluate it.
The description on the product page is quite sufficient to describe itself, which is as below:
Entity Developer is a free of charge powerful modeling and code generation tool for ADO.NET Entity Framework and LINQ to SQL. You can design an entity model from scratch or reverse-engineer an existing database, in which case you will have the classes and the relations with just several clicks. The model is used to generate C# or Visual Basic code with flexible predefined code templates. You can also customize the templates to fit all your needs. Entity Developer for SQL Server uses standard Microsoft SqlClient, LINQ to SQL, and Entity Framework, all of them bundled with .NET Framework 3.5 SP1.
- ADO.NET Entity Framework support
- LINQ to SQL support
- Visual schema modelling
- Customizable and intuitive user interface
- Reverse engineering (database object to entity model)
- Well-thought predefined templates for class generation
- GUI for predefined templates
- State-of-art diagram with zooming
- Diagram overview and printing
- Undo/Redo support
- One-click help system
- Visual Studio is not required
- Free of charge
Monday, August 03, 2009
This add-in is "MapPoint Add-in for SQL Server". It's a very innovative way for some of the key business analysis KPI. For presenting MIS related information, as of this draft, I am not able to think anything better than this. Just use the data already stored in SQL Server and connect it to maps (available for US and EUR only) and it's good to go for analysis.
This would definitely require data in some specific structure to make it usable for this add-in. Also it would require a Mappoint installation.
From a developer perspective, this is one of the best tool to use for practicing and envisioning results of your Spatial data. This add-in has a great support for spatial query development and analysis. Generally some front-end is required to consume and display spatial-data which is generally targeted for geometry, and what better can be a consumer of this than a real map itself.
If not for anything else, I would prefer it for getting a good hold / correcting and analysing my spatial query skills and spatial data analysis.
Friday, July 31, 2009
Monday, July 27, 2009
Monday, July 20, 2009
SSIS Performance , SSIS Debugging , SSIS Development , SSIS Tools , SSAS Tools , SSRS Tools , MS BI Tools - BIDS HelperI'm reading: SSIS Performance , SSIS Debugging , SSIS Development , SSIS Tools , SSAS Tools , SSRS Tools , MS BI Tools - BIDS HelperTweet this !
BIDS Helper is such a great tool that needs no introduction for MS BI Professionals , or rather I would like to used the word "Product" for it to honour its value. I feel that sooner or later, Microsoft should incorporate the functionality that it provides within BIDS or should flag it as an official Microsoft Recommended Add-In and ship it with some service packs.
I would like to present below, some of the very useful functionalities that it provides for the SSIS environment, functionalities that in particular I have found useful. I don't mean to say that rest of the features are not useful, but it means that I have not required to use the features not listed below in my development knowledge.
1) dtsConfig File Formatter: Many of us hate the way config data is written to dtsConfig file and keep wondering why there's no reader or a micro IDE kind of thing for it. It can be read in Visual Studio which would format the excel, but the file formatter in this tool, does it all for the developer.
2) Expression and Configuration Highlighter: This is a super brilliant feature, and is very necessary in large projects where you develop tens of packages every week. Especially this is helpful in setting package configurations in a master - child package kind of scenario where the child packages read their configuration setting from master packages. This feature can work as a proof-reader to have a quick visual check on whether the configuration and expression setting for controls have been implemented or not.
3) Expression List: This is one of the much required report in any package. This feature combined with the previous feature, gives a nice proof-reading (verification) and reporting of the expressions implemented in a package. Also it comes handy to have a list of expressions used in the package, as it gives a better track of points where changes needs to be made.
4) Non-Default Properties Report: This is another brilliant report that becomes almost inevitable while working in a maintenance kind of project, and is also helpful in development projects for documentation and reconciliation purpose. As relevant from the name, it provides a report of all the sensible properties which has non-default values. This helps to verify that any property has not been changed inadvertently.
5) Pipeline Component Performance Breakdown: This is one of the key helpful debugging help, needed in a complex project. Thou from the theory and definition that the product page explains, I am quite convinced that this would help, but still I have skeptic about precision of its report. After all it's just a program and a mechanism of measuring the performance, and can't be more analytical than human brain !! I wish that it was perhaps possible to combine the replay from trace feature of SSIS Log Analyzer with this feature, it would had been a silver bullet in arena of SSIS debugging.
6) Reset GUIDs: This feature is a developer's hammer, and saves quite some time when a package is copied in the same or across environments, where duplication of a package is done for templating purpose.
7) Sortable package properties report: This is another nice report that this tool draws out of the SSIS solution. It's a good report to attach to a build, but it's not comprehensive. It can get better, if it provides options to select properties that are required in the report which becomes useful for a real-time reporting in a complex project. Also I believe this would work for only standard controls that come out of the box of SSIS, but not for custom or third-party controls. To summarize this report is good to attach to any build, thou just for a formality.
8) SSIS Performance Visualization: This is a nice analytical and pictorial representation of execution trees logs available in SSIS. At the first glance, this symbolic language got me confused and thinking about what it means as and requires a thorough reading of how to interpret the meaning of the charts. But it's a swiss knife, if used with a deep understanding. It's not a developer feature, it's a feature that a performance analyst would appreciate and love to use. I like this performance visualization feature 10 times more than what SSIS log analyzer tool shows in the form of pie charts with almost no technical details of how it was derived and the stats behind it.
9) SSIS Variables Windows extensions: This is again a developer's hammer and making your vision more deep and sharper into the package composition. I am not that impressed with the feature that it provides for copying the variables across tasks and containers, compared to the feature where it highlights the icons of the variables impacted by package configuration and expressions.
All in all, I feel that it's one of the must have utility after installing BIDS for any serious SSIS development project, as sooner or later eventually every SSIS developer or performance analyst finds a use of one or more features of this tool.
Monday, July 13, 2009
A new utility has recently got floated on codeplex, and this caught my attention as it's related to SSIS. This utility is known as SSIS Log Analyzer. It works for both SQL Server 2005 and SQL Server 2008. It offers different features (see mentioned below), but the one I like is "Rerun (debug) the entire sequence of events which happened during package execution showing the flow of control in graphical form, changes in runtime values for each task like execution duration etc". This is synonymous to the what profiler offers while recording the trace. The trace file can then be executed again in profiler to measure & analyze database performance.
The features that are listed for this utility on it's product page can be classified into two basic categories:
1) Reasonable user interface to analyze the log or I would use the word "View" the log in a organized fashion than viewing it in the execution results pane.
2) Analysis that this utility claims to provide, in terms of package analysis and log analysis as well.
The features that the product page lists are as below:
- Option to analyze the logs manually by applying row and column filters over the log data or by using queries to specify more complex criterions.
- Automated Performance Analysis which provides a quick graphical look on which tasks spent most time during package execution.
- Rerun (debug) the entire sequence of events which happened during package execution showing the flow of control in graphical form, changes in runtime values for each task like execution duration etc .
- Support for Auto Analyzers to automatically find out issues and provide suggestions for problems which can be figured out with the help of SSIS logs and/or package.
- Option to analyze just log file or log and package together.
- Provides a lightweight environment to have a quick look at the package. Opening it in BIDS takes some time as being an authoring environment it does all sorts of validations resulting in some delay.
It's an interesting tool, and has a good potential of gaining maturity in terms of providing package analysis. This utility used with DTLoggedExec, can be a very useful and handy toolset that too for free.