Siddharth Mehta's Blog: January 2011

Monday, January 31, 2011

SQL Server AlwaysOn and SQL Azure Data Sync for HADR

HADR i.e. High Availability and Disaster Recovery is not a very familiar term for BI professionals, but it indeed is the life line in the DBA community. There are great features in SQL Server for the same purpose, but to the best of my limited DBA knowledge, it considerably depends on the kind of topology and configuration adopted for the same purpose. SQL Server 2008 R2 was a special treat for BI professionals, but now SQL Server Denali is committed to be fair to everyone, and SQL Server AlwaysOn is the new offering in the HADR category. CTP1 is available with partial features of this service. A new topology, new server requirements and new set of expectations are set with the advent of this service. For more information on SQL Server AlwaysOn, check out the SQL Server Denali AlwaysOn portal.

SQL Azure Data Sync is another such service on the Azure platform that can be seen as a HADR provider for cloud. Ofcourse data centers take care of the disaster recovery part of the data stored on cloud, but in situations where data is exchanged in-premise and off-premise across data consumers, data sync can play a very active role in the high availability part of the data stored on cloud.

Imagine a solution the encompasses SQL Server AlwaysOn coupled with SQL Azure Data Sync ! I am sure this recipe has the potential to beat even RAID based solutions in the future. I can't predict much about the future, but for the present, AlwaysOn and SQL Azure Data Sync are worth keeping a watch whether you are on earth or on cloud.

Thursday, January 27, 2011

Azure Books - Windows Azure Book and SQL Azure Book : Review

Tweet this !

In cloud dictionary, Azure in considered a union of SQL Azure and Windows Azure platform. Cloud platforms are a very buzzing choice for almost every kind of business. A cloud based platform proposed directly by Microsoft with potentially promising integration features for .NET as well as SQL i.e. application as well as database is one of the must try choice, even if industry leaders like Amazon EC2 are available. I myself have been a part of the book "Microsoft SQL Azure: Enterprise Application Development", in the capacity of one of the official technical reviewers.

Recently I have been engaged by Packt Publishing, to provide an open review of the book "Microsoft Azure: Enterprise Application Development". Shortly I would provide my review after reading this book completely. There are not many books in market today on Azure platform, and these books can be helpful to anyone who intend to start from zero knowledge on Azure platform. Azure is quite mature today from the time it went RTM, and those who are yet to parse the dictionary meaning of Azure in the parlance of cloud, this is very high time one should pick up a book to get a procedural education of Azure platform. Below are the links to these books.

Update:

I have been very late in sharing the review as my new job keeps me quite busy, so I have not been able to spend time on this item. Below is the review of the book "Microsoft Azure: Enterprise Application Development".

This book is almost of the same flavor as the SQL Azure book. Each chapter explain a different topic followed by a working example with step-by-step explanation. Certain topics like Web role, Worker role, Monitoring and Analytics, Table & Blob storage etc are very important topics from a Windows Azure perspective and those are reasonably well covered. I would have liked more depth in these chapters, but the chapter provides more references for curious minds who wish to dig more deeper in any particular topic. All-in-all I feel that this is the best start-up handbook for professionals who are fresh to Windows Azure environment and intend to come up to speed on Windows Azure environment.

Tuesday, January 25, 2011

MS BI Appliance Solution - HP Business Decision Appliance

Tweet this !

Recently Microsoft announced availability of a new appliance solution known as "HP Business Decision Appliance", which can be seen as next product after Parallel Data Warehouse solution. HP has been a partner even in the Parallel Data Warehouse solution, and this solution has been developed in the exclusive alliance between Microsoft and HP. Though the purpose both of these appliance solution serves are quite different, the common characteristic is that both of these solutions are powered using SQL Server in one or another way, and the purpose it serves primarily is business intelligence. You can learn more about this solution from the announcement published here.

In case you intend to learn still more about this solution, here is an announcement from SQL Server Appliance Engineering Team, where you can also watch two screencasts explaining the details of this solution. A flavor of Appliance Engineering Team resides in the CoE (Center of Excellence) of every organization, it's just that most of the organizations do not provide right nurturing to it. Solution offerings in the form of appliance solutions is a growing market, and there are already few market players who are engaging alliances with local vendors to develop appliance solution to tap a market that has not been explored / exploited to it's maximum potential.

Big IT players, can definitely harness the potential of such markets by strategically aligning CoEs in the area of developing solution offerings in the form of appliance solution. In my views, standardized appliance solutions are an ideal candidate for consideration as a value-added service in addition to the regular IT services.

Appliance solution market is one such place where aspiring entrepreneurs, can make their entry into the IT business too. Most of the developers who work in IT factories housing thousands of employees, aspire to start some sort of business in IT. Working Capital is one of the biggest challenge that a service classed person faces to start a business. The first step can be to develop a tool / component and start business by licensing the same to clients. Once you grow in this field and you have a reasonable and modest working capital, appliance solution is another business that IT developers can venture.

Monday, January 24, 2011

I joined Accenture

Tweet this !

Today I joined Accenture Services Pvt Ltd in the capacity of Associate Manager in the Avanade division at Mumbai location, and would be continuing to deliver on Microsoft Business Intelligence platform. I had been working till date as a Senior Consultant with Capgemini and the roles and responsibilities were scoped till that level.

I am sure that in my present capacity as an Associate Manager, there would be lot more new challenges, new opportunities and much more technical interaction than ever. As a part of growing up, you work with different organization in your career, and each organization add a lot to your maturity as a professional. I am thankful to Capgemini for it's contribution to my career, and all those friends and colleagues with whom I used to interact, can still continue to do the same, through this blog and my personal email id: siddhumehta_brain@yahoo.co.in In scenarios like this, you see the main advantage of having a blog, which acts as a global gateway for people to reach you and keep your communications live with the outer world, irrespective of your geographical location or professional positioning.

If you are my colleague from Accenture, and would like to interact with me, I would be glad and more than happy to do so. You can reach me on my personal email id, or you can send me your official contact details on my personal id, and once I have the employee services available from office, I would add you to my contact list from office communicator.

Thursday, January 20, 2011

Selecting the best Business Intelligence software / service provider for your BI needs

Tweet this !

Selecting the best Business Intelligence service-provider involves a very tedious analysis process, and this is one such decision where no client would like to go wrong. Millions of dollars gets invested when any BI project contract is awarded to any vendor, and however fool-proof your SOWs (Statement of Work) may be, if the BI service provider is not mature in the kind of service it has been selected to provide, this would mean that your bad time started at the very moment of selection !

Before I jump to the core of the subject, one thing that I would emphasize, any organization should not just bank upon advice of just any reputed technology consultant. There is a distinction between a BI consultant and a Technology consultant. There are many professionals who excel in one tool / technology and they extend their profile by adding the rest of the technologies to make their profile look complete. For example, I have come across professionals who are SSIS or SSAS or SSRS experts and they claim that they excel in Business Intelligence, as they are familiar with just the vocabulary of all the three. In my views, professionals who have worked in different roles like business analysis, data cleansing, systems integration, cross bi integration, data warehouse development, corporate reports planning, MID / DSS development, etc., adds a flavor of BI to one's profile. Having worked in SSIS / SSAS / SSRS and being an expert at any one of them can mean that they are ETL / Reporting / Cube development experts, but it does not mean that they are BI consultants / experts.

Below are some questions / checklist of actions that one should consider before selecting a BI service provider.

1) Check whether you really need to build your BI solution from scratch or you can use a PaaS / SaaS BI service provider. Study the advantages of SaaS as well as PaaS based BI solutions. A few whitepapers from TDWI can be a very easy reference manual to start.

2) If you decide that a PaaS / SaaS service provider is your choice, check the list of clients that it caters and analyze the nature of business of those clients. This can be seen like a kind of reputation check.

3) Key areas to check in a PaaS / SaaS BI provider are: Robustness of cloud facilities, Flexibility in availing customization, Billing plans, Security Compliance, Cloud Gateways, Ease of switching between in-premise and off-premise networks, and Systems Integration with production systems.

4) In case you decide to go with a BI software / services development vendor like Accenture, IBM, Atos, Capgemini, TCS, Infosys, Wipro or other such famous vendors, there are a different set of checkpoints to consider. Issue with selecting among these is that most of these vendors would provide similar propositions, similar architectures, similar billings and similar quality of work-force. So how do you select the best BI vendor?

5) In my views, if I was in charge of selecting the vendor, the first thing I would check is which architecture is going to keep my Total Cost of Ownership to a minimum in the long run. This can be a result of support staff requirements, enhancements to the solution over the period of time, upgrade in licensing, scaling of hardware to meet growing needs of performance etc.

6) How updated is the vendor with the current BI trends and how deep is the BI development experience is very important. Clients generally would like to build a solution which they think is the best for them, but they do not know for sure that it's the best. This is the reason they need to hire a professional BI software / services vendor, as the vendor is expected to be deeply experienced with BI software / services manufacturing and up-to-date with industry trends. Only an experienced BI vendor can advise a client from making mistakes, the rest would be yes-man to clients. Believe me, I have been part of meetings where Directors of BI practice are not aware that DatAllegro has been digested long before, and they suggest their Competency and CoE head to promote and recommend this to their clients !! Just think how catastrophic can this be to the client !!

7) One another point to get some insight about the BI maturity of an organization can be to check it's affiliation within the BI community. For example, if you are having a MS BI project and you want to check the BI strength / maturity, it is worthy to check if the technical / solution architect of the project has any whitepapers, publications, blogs, community projects on his name. Infosys, for example, has infosys blogs which I quite admire. Many other organizations have a huge manufacturing IT work force but no affiliation in the technical space. This does not mean that the entire organization is not technically competent, but it definitely increases the risk that the IT workforce that may be working for your project might be outdated with technology. Certifications like MCP, MCTS etc are of no value if they have not been put to practice.

8) Finally, one should check for what are the different innovative solution offerings that a vendor's CoE has to offer. An admirable BI practice would have a healthy CoE, and developing BI solutions offerings are an integral part of any such practice.

Market reputation, historical business relations and budgetary constraints are definitely the driving force to shortlist vendors. But after this boundary when you have a set of vendors still to choose from, above points might be of some help to get an insight in the capabilities and characteristics of a Business Intelligence software / service provider.

Tuesday, January 18, 2011

Self Service Dashboard Development using Analyzer as Reporting Solution

Tweet this !

I have been engaged by Strategy Companion to conduct an open and unbiased exploration of their reporting solution - Analyzer, which is mainly targeted to facilitate reporting from cubes built using SQL Server Analysis Services. Also they have generously provided me an opportunity to share a few BI recipes, which I have experimented with using Analyzer. I started exploring Analyzer, and the first thing that I observed about Analyzer is that it's designed for business users, giving them the same or even more weight than technical users.

Self-service BI

Self-service BI is a buzzword and sales folks generally use it as one of their Unique Selling Propositions (USPs) to market their solutions. In my view there is a difference between self-service BI and managed self-service BI. Easy authoring and controlled utilization are two of the very important factors for a self-service BI solution. If the authoring environment is not easily adaptable, there is a great chance that your solution would not get utilized at all. If utilization is not controlled there would be an explosive and unorganized utilization, as the report users would treat the reporting solution as a lab to experiment with reports in a free-flow manner. In simple language, considering a reporting solution like Analyzer, the report authoring environment should be easy enough such that business users can create their dashboards with ease. Also user and role based security should be available, so that report authoring and utilization can be managed. In this article, I intend to share the report authoring experience.

Before one makes a decision about using a product, any CEO / CIO / SVP / Analyst would want to check out certain fundamental level details about the product, which generally falls into two categories: Capital Expenditure (CAPEX) and Operational Expenditure (OPEX). Let's glide through such details in brief.

Licensing (CAPEX): Analyzer comes in different licensing flavors, and the major classifications are:

Enterprise - This is for internal corporate BI applications.

OEM - Use this edition if you intend to integrate Analyzer into your own application by using the features of this solution as a web service.

SaaS - In cases where you intend to exploit the benefits of offering a reporting solution on your own cloud based platform, give a try to this edition.This version is used by companies who are hosting BI in their own cloud and (usually) charging their customers for access to a set of pre-built reports and dashboards and the data they contain, along with the ability to interact with that data.

Deployment (OPEX): Analyzer is a zero-footprint installation. This is generally a confused term with many professionals, so I would elaborate on this a bit. Analyzer can be installed on a central BI / DB server which has IIS installed on it. Or the IIS machine can be a separate machine from the BI / DB server. Analyzer needs to be installed on an IIS server (one or more) as this solution is developed using .NET and DHTML, and it also needs access to a SQL Server 2005 / 2008 / R2 database engine as it creates a database to use for its internal functioning such as metadata storage. Workstations can connect to Analyzer using just a browser which means that you do not need to install anything on client machines except a browser. This is true no matter what role the user has, such as Admin, Report Designer, or End User.

Now let's focus on the beginner level recipe to create a dashboard using Analyzer. I call this recipe as "Zero to Dashboard in 60 Minutes".

Scenario: A Sales head of a company needs to create a quick last minute dashboard to present at the quarterly board meeting. Company has a cube that is created using SSAS, and for the sake of this demo we would be using the cube created using AdventureWorks SSAS project that ships with SQL Server.

Hardware Setup: Most companies have contracts with hardware maintenance vendors, and in such environment end-user terminals are equipped with only the necessary amount of hardware required as contracts can be pay-per-use. I intentionally used a machine with 1 GB RAM, 60 GB free hard disk space, and 1.77 GHz Intel processor. This is a typical configuration of any low end laptop that should be sufficient to folks who just need to use MS Office and Outlook on their machines.

Requirements: The target audience of the dashboard is the senior management of an organization, and the Sales Head is authoring the report. Such dashboard / report can be expected to contain a few of the commonly used constituents of a dashboard.

1) A ScoreCard containing KPIs, which can be hosted in the cube
2) Strategy Map showing at least some basic kind of process flow
3) Geospatial Reporting, which is one of the best presentation forms for a senior level business audience
4) Matrix Reporting, for a detailed level study of aggregated figures
5) Filters, which are necessary to analyze the details in isolated scopes
6) Drill-Down functionality, as problems decomposition and study is carried out in a hierarchical manner.

Report Authoring: I had Analyzer and the AdventureWorks cube on the same machine. Once you start Analyzer and create a new report, you would find the interface as visible in the below screenshot. To author the report, entire functionality is available on the toolbar or from context-sensitive menus. Plotting data on controls is a matter of drag-and-drop from the data tab visible on the left side.

For all the points described above in the requirements section, out-of-box controls are available.

1) KPI Viewer - This control be used for creating a scorecard hosting KPIs. Also you would find some very interesting columns like "Importance" out-of-box which can be quite an effort to create in PerformancePoint Services.

2) Process Diagram - This control can be used to create a basic level strategy map. Though this strategy map is not as appealing as a Strategy Map created out of a data-driven diagram in Visio, but still its fine enough for a last minute dashboard. Also it can host my KPIs there too.

3) Intelligent Map - This can be considered synonymous to what Bing Maps control is to SSRS. It's completely configurable and contains wide variety of maps ranging from World Map to area-specific maps.

4) Pivot Table - This control is perfectly suitable for OLAP reporting in a grid based UI.

5) Filters - Report filter have a very different UI, than traditional UI of a drop-down. Though it occupies more real-estate of screen space, it makes the report more appealing, so it's worth it. Considering the present scope of this report, I chose to place the filters at the bottom of the page, instead of placing it at the top.

You can see at the bottom of these screenshots that each report or dashboard in Analyzer can contain multiple sheets (no limit) each of which can contain its own combination of controls such as pivot tables, maps, charts, etc. In this example we are only using one sheet.

Check out the context menus of all these different controls, and you can see what different options are available with each control. On selecting "Discover Children" at "Alabama" level in pivot table, a different sheet opens up with this wonderful report and UI, as shown in the below screenshot.

Summary: With out-of-the-box controls, drag-and-drop functionality, a very decent looking report can be created in less than 60 minutes, to target the senior most audience of an organization who expect a report that supports decision making with its analytical capabilities. Provided your cube is ready with all the data structures like KPIs, Named Sets, Hierarchies, Measures, etc., reporting is almost taken care of if Analyzer is available at your disposal. A phrase that suits the summary is "Keep your ducks in a row" i.e. have your cube in proper shape to support your reporting, and then using Analyzer, below is the result that I was able to achieve in less than an hour, with very little experience using Analyzer beforehand. A more experienced Analyzer user could no doubt build this kind of report even faster.

Monday, January 17, 2011

SSIS and StreamInsight Integration

Tweet this !

Have your ever heard of something called "StreamInsight Multi Stream Transform" ? I am sure the answer would be "No", else you know what I am talking about. When I read about this subject, I instantly started thinking about why I need to use StreamInsight from SSIS ( probably to calibrate the incoming / outgoing data stream from StreamInsight engine), and is it even possible. A technical mind immediately jumps at the implementation level as soon as the requirement subject strikes the mind. I immediately started thinking, whether I can use Script transform and hook into StreamInsight by some means. StreamInsight is one of the least explore technologies in SQL Server stack of technologies, and generally most of us have explored CPM (Corporate Performance Measurement) much more than CEP (Complex Event Processing).

SSIS is gradually evolving as a rich platform that can facilitate much more than the very definition of what we term as ETL.

1) Script Transform is the first window connecting SSIS to the managed .NET / unmanaged COM world. We already have established examples where we can use this transform to use services like Bing Maps, SSRS and others.

2) Data Correction transform is another upcoming transform that connects SSIS to SSDQS in SQL Server Denali. Considering this transform coupled with Data Profiler Task and Data Profile Viewer, SSIS has sufficient ammunition to target corporate data cleansing assignments.

3) The transform that I discussed above "StreamInsight Multi Stream Transform" is a transform that has been developed in a POC carried out by SQL CAT Team. Entire explanation of to and fro integration of SSIS and StreamInsight has been wrapped up in a whitepaper, which can be read from here. This whitepaper is under the umbrella of SQL Server 2008 R2, but one of the screenshots which shows SSIS implementation has the icons of SSIS that are only available in SSIS Denali CTP1, which means that it might have been tested in SSIS Denali. So my forensic sense tells me that, it should be very well possible in SSIS Denali too, and logically speaking it should.

I would wish that this transform becomes a part of SSIS Denali, or least the same can be made available on the SSIS Tasks and Components Gallery, so that professionals can leverage this transform instead of building it from scratch.

Sunday, January 16, 2011

Contained Databases in SQL Server Denali

Tweet this !

An interesting concept has been introduced in SQL Server Denali, and it's called Contained Databases. The main aim that this feature serves is the portability of databases without dependency on server-level objects which are not contained in a database, logins for example.

DACPAC was introduced in SQL Server 2008 R2, but this was well received by the application development community than the database development community (in my views). Contained Databases looks like a planned impetus addition to DACPAC. To use this feature of Contained Databases, Containment has to be enabled at the server level with a simple boolean property. Thereafter, one can set the newly introduced "Containment Type" property to mark the containment type of the database. As of now available values for this property are "None" and "Partial", but this enumeration looks incomplete and we can expect more values in this list. Setting this property to "Partial" would allow to create server level objects at the database level, again for example - logins. As the server level objects can be now contained at the database level, this is assured to bring a level of change in the client connectivity mechanism.

It would be interesting to see, when contained databases are available in a fully operational state. The current level of containment type is limited to "Partial" in CTP1 and further CTPs can be expected to add more to it. You can learn more about Contained Databases in SQL Server Denali from here.

Friday, January 14, 2011

Farewell to Capgemini

Tweet this !

Today was my last working day with Capgemini, and I would be heading on with new challenges in my career with my next employer. I am reserving the name of my new employer till the time I join them. I would be on a vacation of 1 week and would be joining on 24-Jan-2011.

It has been a huge journey of 3.5+ yrs of association with Capgemini, and during this time I came across many many people from Capgemini. It's not possible to meet everyone and greet bye in person. So I would like to request all of my ex-colleagues from Capgemini, to accept this post as my humble parting wishes to all of you.

I have been a very huge admirer of the movie "Pursuit of Happyness". Some of the dialogues in this movie are the philosophy that has been driving my life and career. "Do not let anyone tell you that you cannot do it. BCoz people can't do it, they tell that you can't do it. If you have got a dream, you need to protect it. You need to nurture it and fight for it." I have encountered various circumstances in my personal life and professional career where I had to make hard choices to protect my dream, and parting with Capgemini has been one of them. And this hard choices comes at a cost, one needs to go through tough times, sufferings and hard-work too without that guarantee that it would pay-off positively. I personally believe that once you have readiness to face the worst, capability to assess your strengths, target clearly defined in mind, right attitude and aptitude to reach your goals; I feel even if you do not achieve 100% of what you targeted, you would at least make 80% of it. One can discount the rest of 20% on luck.

In an environment full of noise, if you whisper no one would hear you. To get that attention and make yourself heard, you need to shout. I believe that I have the capability to make my shout that loud, so I am heading on to bigger challenges in my career. My sincere thanks to all who have helped / guided / supported me throughout my career in Capgemini.

Tuesday, January 11, 2011

SSIS Denali CTP1 Designer Enhacements and Improvements

Tweet this !

SSIS Designer in SQL Server Denali CTP1 had undergone a lot of changes, and it really looks much more developer friendly. Just by browsing the designer area, one can make a note of the new features getting introduced in SSIS Denali. Below are some notes of the observations that I have captured while working with the designer.

1) When you open a new package, the first glance goes towards toolbox. All tasks and transforms are not categorized in discrete categories. Favourites and Common are two common categories that are available at Data Flow and Control Flow level, and rest of the categories vary depending upon the selected tab. Also toolbox now has a small description tab, and each task / transform has a description which gets displayed here. A help button is available in this tab, and it opens up context specific page from BOL. Even icons of tasks and transforms looks much relevant to the functionality that the task / transform caters.

2) UNDO was one of the most awaited feature and it can be seen now in the toolbar in active mode. Test is and it should work fine.

3) Data Viewers are now operated in a new way - you can enable / disable data viewer on a data flow path.

4) A new menu option is available when you right-click on a data flow path - "Resolve References". It's a better mechanism introduced in this version, to resolve mappings using an organized editor.

5) Package designer area has a zooming scale on it. Using the same one can zoom-in and zoom-out the package view. Locator window has disappeared but this new scale can be considered as the replacement for the same.

6) Tab options area now contains two small new buttons - "Parameters and Variables"and "Toolbox". SQL Server Denali introduces a new concept of parameters and it changes the way we work with variables too.

7) New transforms like Data Correction, Source Assistant, Destination Assistant are available.

8) Right click on the solution and select Deploy. You would find a new Deployment Wizard, and effectively a new deployment model. You would also find new menus like "Convert to Legacy Deployment Model" to support older model of deployment which we are used to in versions prior to Denali.

9) Try creating new variables, and you would find data types like DBNULL. There is a long list to check out in the data types section.

10) Try creating a new derived column, and you would find that the default data type is of DT_STR i.e. string data type and not DW_WSTR i.e. Unicode string. I am still to analyze more thoroughly on this behavior.

Some of the features that I still miss are coloring of icons when they are configured using expressions, ability to specify data type in derived column transform by using editor itself without the need to change it from advanced editor and exporting package design to jpeg/pdf formats. As of now it looks that BIDS is here to stay, but it's too early to comment anything.

These are my observations just from the initial glance. There's a lot more to explore and I am sure that by the time I would complete a thorough analysis, a new CTP version would be on the horizon with a lot of new offerings and fixes. Feel free to share your observations and/or experiences and help to make this list more comprehensive.

Monday, January 10, 2011

Data Quality Services and SSIS in SQL Server Denali

Tweet this !

SQL Server Data Quality Services (SSDQS) is one of the new advancements in SQL Server Denali that is getting crafted at Microsoft labs. If you try to google out on this subject, you would find almost no content even on MSDN, as this development of this service is still is progress, and this is probably one of the reason that it has not been released in CTP1.

These days I have been spending quite a lot of time analysing SQL Server Denali CTP1. SSIS has already opened an interface to take advantage of this service, and this is in the form of a Data Flow Transform - "Data Correction" transform. The below image shows the interface of the editor of this transform.

Data Cleansing is a very big exercise and is one of the most integral part of any data migration project. There are a lot many different kind of portal based tools available in the industry for data quality, having a sort of operational interface that Master Data Services provides. The beginning of Data Quality related services and components began with SQL Server 2008 with the inception of Data Profiling Task and Data Profile Viewer. In my views, making data quality related services available in ETL is a very wise move and symbiotic for SSIS and SSDQS. This would help to open another very strong avenue where SSIS can serve in a very promising manner i.e. Corporate Data Cleansing projects, and SSDQS got it's very first integral client as SSIS apart from the DQ Client.

SQL Server has introduced many management libraries like AMO, SMO, DMO, RMO etc over the period of time. Data Quality Services can be expected to introduce a brand new object model something like Quality Management Objects - QMO, that can be operated by client applications like .NET and Sharepoint, where service providers can build their own flavor of data quality solutions using this framework. These are just some thoughts that my mind is sketching out, knowing SSDQS is on the horizon. I would have to wait to see if my vision about the roadmap for SSDQS is correct, or Microsoft surprises us even beyond our imagination.

Saturday, January 08, 2011

Book on SQL Azure - Reviewed by Siddharth Mehta

Tweet this !

Packt Publishing generously provided me an opportunity in the past year to be part of a book on SQL Azure titled "Microsoft SQL Azure Enterprise Application Development", in the role of Technical Reviewer. I am very passionate about cloud based business intelligence, and MS BI on the cloud is one of my favourite topics. This book became available at stores in Dec 2010. As I have been a part of this project, I can confirm one thing, this book is worth every penny. I am not trying to sell the book, but as I have been a part of this project in a deep technical role, I had the opportunity to have a thorough analysis of the subject when the book was in drafts. If you are looking out for a book on SQL Azure plus a decent starting lessons on Windows Azure, this book is definitely for you. Book doesn't expect any level of maturity on cloud and you can start from zero and reach at least intermediate level at the end of this book. In case if you have/had procured the book and had a chance to go through the book, please feel free to share your view with me.

Thursday, January 06, 2011

SQLBI Methodology - Review

Tweet this !

Marco Russo recently provided me an opportunity to provide my feedback on SQLBI Methodology, which is an architecture designed by Marco Russo and Alberto Ferrai to the best of my knowledge. This well documented architecture can be read from here.

Below are my views / feedback after analysing the architecture document. To better understand my views, please read the architecture document prior to reading the below points.

Size of BI solution Vs Complexity: In my views/experiecne, the volume of data that needs to be processed ( right from OLTP till it gets stored in MOLAP ) combined with the size of the BI solution sponsor is directly proportional to the adoption of BI solution develppment / adoption methodology.
In simple words, if SMBs can manage their BI solution development using a SaaS methodology compared to developing DW after buying software licenses, most would approach the former methodology. Those businesses who adopt the full refresh of DW every time, would not care much for a methodical approach as change management and incremental loads are not their concerns at all due to the short-lived historical state of DW. Organizations having a large DWs, often in units of TBs would definitely care for a methodical approach as the magnitude of impact of any change is quite huge.

Components of a BI Solution:

1) Source OLTP database - OLTP has known issues, which majorly affect delta detection and accessiblity of DB to DW development team. I completely agree on this part. Here a concept called "Mirror OLTP" is introduced.

If Mirror OLTP is used as a facade, it would not be of much sense as if you can create database objects in other database and then fire a cross-db query, then logically speaking the same facade should be allowed to be created in source OLTP with the isolation of a logical boundary like schemas.

If Mirror OLTP is considered as a snapshot, which is almost a clone of the original DB, one can exercise full control over the source DB, but this is not as easy to implement as it sounds. Consider that a source DB that lies on a SAN and is horizontally partitioned across geographies and you are trying to replicate the same. For this Mirror OLTP you would require to constanly maintain another DB, which demands an increases TCO of the solution. It would require a pass from information security policies and guidelines as audits like BS7799 / SOX etc would require strict compliance.

Instead of developing a Mirror OLTP, one option can be, creating a script of views / SPs / any database objects, and create them just-in-time (which would be also easy for approval from DBAs as they would be more happy for this temporary gateway opening than a permanent cross-db gateway), use them for delta detection and then purge out the same. These scripts can be deployed using VSTS 2010 / VSTS DB edition / any other change management tools you would have at your disposal.

In worst cases, where this option can't be worked, we can opt for what we call as Permanent Staging area, completely suide with data and metadata aligned towards facilitating ETL for DW loads. To me, Mirror OLTP seems to be a compound of the same.

2) Configuration DB - This seems like a facade opened up for users to configure the Mirror OLTP / Staging area / DW / Data Marts, with some built-in configuration settings / logic for each layer.

3) Staging area - This is identified as a temporary staging area. Here it's mentioned as one cannot store persistent data in staging, and I opt to differ from this theory. For managing master data from different source systems, which would not contain delta everytime but would still be required for ETL processing due to ER model design, a permanent staging area can exist. Temporary staging area is also required, and this section is completely alright with me.

4) Data warehouse and Data Marts - This details mentioned in this section seems almost Inmon methodology, where you develop a DB containing your consolidated data from ETL and then you build marts which can be thought of as a limited compound of DW DB. This can be thought as synonymous to what perspectives are to cubes. Data Marts are basically crafted here to compartmentalize different functional areas in DW. Here you would be required to create what I term as "Master Data Mart" and other data marts would be based on functional areas. Maintaining and populating these data marts can be quite challenging.

5) OLAP Cubes - You would create cubes containing data from one functional data mart + master data mart.

6) SSRS Reports - To me this deparment seems to be struggling, due to the design of data marts. Reporting requirements can be extremely volatile which can be complex enough to induce a change, which would require manipulating ETL -> DW -> Data Mart. Also there can be cases where you might introduce an another small ETL layer between DW -> Data Mart. Operational reporting would be done against Data Marts and not DW, as this architecture is an adoption of Inmon's view to a greater extent.

7) Client Tools - This section is okay with me.

8) Operational Data Store - This section clearly identifies that ODS should be used with care.

Summary: In my views, this architecture can be perceived to act like a Prism. You have a ray of light, and after passing through the prism, it splits out in different colors. And you can catch the color you need.

One big issue that I see with this architecture is Lineage Analysis. In this architecture lineage analysis becomes very very complex, as deriving lineage of data from a dashboard till OLTP is highly challenging. In addition to the configuration DB layer, there should be one more vertical layer where lineage of the data is tracked.

Considering a practical example, say you have a corporation that consists subsidiary companies, for ex CitiGroup has child companies like CitiCorp, CitiBank, CitiSecurities, CitiMortgage etc. When data is intended for CitiGroup level to CitiBank level, this architecture can hold good, as each child companies is a different business unit & model with it's own level of complexity. And each child company's analysis would have a dependency on data from other child companies to a certain level. This architecture seems effective at this stage.

If I were to implement this only at the CitiBank level, I would clearly go for Kimball methodology. But this is my understanding, analysis and choice. There are a lot of scenarios in the sea of business models and requirements, and I am sure this architecture with certain modifications (which is my personal preference), can help in a very effective manner.

Wednesday, January 05, 2011

Dashboards to monitor data warehouse development and execution

Tweet this !

If you have watched any military combat kind of movies movies, you would have noticed that when the military / armed forces reaches the target site, and deploys a radar to monitor the operations and establish communication within the teams as well as with the headquarters. You must be thinking that am I going to share some movie story? The answer is "No", I am just trying to make my point.

In the way described above, whenever the DW incremental load execution life-cycle begins, I see ETL as the primary driver of the process whether DW is in development / production phase.

1) Using SSIS you can drive the entire process right from extracting delta from OLTP till processing of cubes from within SSIS packages. If you follow a methodical and process oriented ETL approach, you should essentially log each task in some table which you might consider as a "Process Control Table".

2) During the ETL execution phase, you would have audit and log tables that would be populated for each cycles. Over the period of time this table would / can grow huge and might need archival. Many ETL designs do involve a staging area which would also contain data depending on whether permanent / temporary staging is considered in the design.

Different stakeholders like power-users/end-clients, project management, team members, client's IT support staff, development team, testing team and others would like to get access/view of the data created as mentioned in points 1 and 2 for their own needs. This creates a dependency on the development team to constantly facilitate them for the same. And here is where Dashboards can be of great help.

If you create a few reports on the top of this data, respective teams can just access these reports and view data on their own without any dependency on the team that owns these data. Also these helps different teams to have a common communication medium to share updates and status of the current activity on the development / production systems. For ex, if Project Management is constantly bugging the development team to send out email updates of the incremental load during a production release, a dashboard fuelled from PCT can make them self-sufficient to check the updates. If testing team constantly bugs you to provide data from the staging area for their test cases, a dashboard / report can make them self sufficient to mine out any data. If IT support staff needs to check for error logs, reports can make them self sufficient.

The only question after this is where and how to create this dashboards. SSRS Reports can easily cater this requirement. SSMS can host SSRS reports, but the ideal place would be deployment over sharepoint so people can collaborate on a single medium. You can even facilitate this reporting using Excel 2010 and use Excel Webapps for collaboration.

Coming back to the idea of military installation, DW development is like a military exercise and the first thing is to set your radars and establish communication within teams (development, testing, QA) and with your head quarters (end-clients, PMO, BAs), so that everyone is aware of the progress especially during the development phase of the project. I hope you like this theme of radar installations by military superimposed on DW development :)

Monday, January 03, 2011

Data warehouse development life-cycle

Tweet this !

I have been attending a number of meeting these days, and I am encountering a common question everywhere - What are the layers that you plan to incorporate in the architecture design of a data warehouse development life-cycle? I found this question rather interesting and thought of sharing my views on the same.

Speaking from a high level, from a DW development perspective, firstly one needs to figure out the boundaries of development. Generally the extreme boundary starts from OLTP and ends at Dashboards. After having these boundaries considered, the following are the layers / development arenas one can consider for a DW development from scratch.

1) Delta Detection - This is the first exercise that you would plan with your OLTP system. This is a very important exercise, as this would decide a few other exercises in the life-cycle.

2) Staging Area - Based on the requirements and considering the complexity of delta detection, a temporary / permanent staging area development would be required.

3) Master Data Management - Do not confuse it with the standard MDM practice, which is more towards modeling. Here MDM means how you would manage your MDM in the staging area / in the delta, as delta applies only to transactional data. Master data do not change that often, and you need master data for your ETL processing. This has to be managed at the facade layer you would build for delta detection / in the staging area, but it has to be planned along with the points 1 and 2.

4) Dimensional Modeling - This is the exercise where you start modeling your dimensions using the Kimball / Inmon methodology.

5) Data Mart Design & Development - Only after the point 4, one can start developing a data mart which lays down the base for the next exercise of ETL development.

6) ETL Design & Development - Points 1 - 2 - 3 are to serve the ETL processing. ETL basically serves as the processing engine to transform your relational data to suit the model of your data mart. One can consider the above points in E and this is T + L. Until you Data Mart is in place, one does not have any idea about the target schema, so ETL development makes sense only at this level.

7) Cube Design & Development - This exercise can be done in parallel with point 6. Once you have your data mart, you can start with this exercise. In fact, you can start your exercise even before your data mart is in place, but if your data mart is not ready means your dimensional modeling is still too volatile / evolving. So better start this exercise after you have some concrete model ready for your data mart.

8) Operational and Analytical Report Design - Cube is generally refreshed at regular intervals and only exception are real-time cubes. For those refresh windows where cube does not have the data from OLTP that is loaded after the latest cube refresh cycle, operational reporting needs to be provided. And analytical reports would serve as the constituent for scorecards / webparts that would be used in Dashboard development.

9) Dashboard Development - After the above phases are ready, you have the engine ready and it's time to give a face to your machine. Generally dashboards are fuelled from cubes to a major extent, and this is generally the final phase of a DW development life-cycle.

I have tried to describe the various development cycles that form a DW development life-cycle from a very high level. Feel free to add to it.

Siddharth Mehta - Microsoft SQL Server MVP 2011 !!

Tweet this !

I started my blog way back in 2006, just to book a domain name on blogspot. I could not find the domain on my name "Siddharth Mehta", as it was already booked, so I used my nick name "siddhu" and choose the domain name "siddhumehta". I was not mature enough or serious at that point in time about blogging and technology. In 2009 I thought of starting my blogging again. I never felt like a super intelligent personality who is going to start blogging to enlighten the world with his knowledge, and help others. My motto was help others to help yourself, i.e. I wanted to put my knowledge to test on how I could solve problems that professionals face in real time, as sitting on my office computer, I cannot face all the kind of issues that professionals face across the globe. I was having the idea that by blogging, I would learn a lot as well as create a name in the field of MS BI.

When I started blogging, the only audience I had was a couple of my friends, and the was the result of blog marketing that I used to do in my professional network. I was never interested in re-blogging content that MSDN already provides. My vision was to share what I face in my day to day work, my viewpoints, my findings, my learning and issues that professionals generally struggle. I thought I would share about topics that I struggled to deal with, topics where I derived solution after a lot of struggle due to lack of reference material / guidance from technical experts / lack of hardware-software resources / other reasons. I come from a very modest background where having a personal laptop is considered a luxury, and I thought this blog would help those who share the same story.

I found my bandwidth is utilized too less on my blog, and there are other popular mediums where I can contribute. After my blog started getting little more recognition, I started authoring for different online publications. It was not easy, and I remember that publications were not ready to consider my work as I had no proven track record of authoring. I didn't lose heart and I kept on blogging about genuine technical topics, and my experience as well as insight in the field grew even deeper. Finally my authored work started getting acceptance. I started taking my baby steps in the field of authoring, and by this time I had already completed more than a year in blogging.

Gradually the popularity of my blog grew more day by day, more and more professionals started interacting with me through email. My articles started getting recognition on different popular websites and portals. I got an opportunity to author a whitepaper for MSDN Library. I got opportunity to review a book on SQL Azure title "Microsoft SQL Azure Enterprise Application Development". People used to ask me why I don't put GoogleAds on my blog, and my answer was/is that I consider my blog as a showcase of my journey, technical abilities, my technical face to the outer world.

In my definition, "Expert" is one who has not only thorough knowledge about the subject, but has applied the same to address a huge number of problems. It is about applying your knowledge to derive solutions to address known / unknown problems with a vision that gets developed only through experience. Today Microsoft has certified me as a MVP in the field of SQL Server, and I feel speechless. Two years of genuine blogging, affiliation with technical community / professionals / blog readers, authoring, technical reviews, and thought leadership.

It has been a very huge journey, and I would like to thank Microsoft for this grand recognition and motivation as a SQL Server MVP. This is an encouragement that I cannot explain in words. I would like to thank everyone who had helped me to reach the level that I have attained today. This post would be incomplete without expressing my special gratitude to certain people and organizations, and they are: MSSQLTips.com, MSSharepointTips.com, SQLServerCentral.com, SQL-Server-Performance.com, Microsoft documentation team - Doug & Carla. All of you have helped me directly / indirectly and I thank you for the same.

Monday, January 31, 2011

Thursday, January 27, 2011

Tuesday, January 25, 2011

Monday, January 24, 2011

Thursday, January 20, 2011

Tuesday, January 18, 2011

Monday, January 17, 2011

Sunday, January 16, 2011

Friday, January 14, 2011

Tuesday, January 11, 2011

Monday, January 10, 2011

Saturday, January 08, 2011

Thursday, January 06, 2011

Wednesday, January 05, 2011

Monday, January 03, 2011

Latest Trends and Technologies

Elasticsearch Resources

Hadoop, BIG Data, and Cloud

Read My Articles

Microsoft Business Intelligence

SQL Server Product Team Blogs

Community

MS BI 2008 Whitepapers

Article Category

MS BI 2008 Video Tutorials

Blog Archive