Sunday, September 23, 2012
I'm reading: Hadoop tools for SSIS, SSRS and SSAS like Integration, Reporting and AnalyticsTweet this !
Data hosting, processing and reporting is dramatically changing on a variety of extremely different platforms than ever. With the emergence of NoSQL and Big Data, systems such as Hadoop host unimaginable volumes of data. Google is soon to hit 1 Billion Android device activations. US and China collectively contributes to almost 300 million iOS + Android activations. Sourcing data from systems like Hadoop, mashing it up with relational data sources and provisioning reporting and analytics on the most aggressively growing platforms like Android and iOS is not an easy job, leave apart the complexity, cost and skills involved in the process.
Recently I have seen quite a couple of SQL Server and MS BI related blogs writing about how to write code for HBase, Pig and for other similar sources. Industry matures in terms of developer productivity and user friendliness much aggressively than one knows. Talend - an open source provider of tools for managing Big data, provides a tool called Talend Open Studio for Big Data. Its a GUI based data integration tool like SSIS. Behind the scenes this tool generates code for Hadoop Distributed File System (HDFS), Pig, Hbase, Sqoop and Hive. This kind of tools really take Hadoop and Big Data to a extremely wide user-base.
After you have the ways to build a high-way to a mountain of data-source, the immediate need is to make meaning of these data. One of the front-runners of data visualization and analytics, Tableau, provides way to create ad-hoc visualizations from extracts of data from Hadoop clusters or straight live from the Hadoop clusters. Creating visualization from in-memory data and staging extract of data from Hadoop clusters into relational databases and creating visualizations from the same; both are facilitated by Tableau.
Other analytics vendor like Snaplogic and Pentaho also provides tools for operating with Hadoop clusters, which does not require developers to write code. Microsoft has an integrated platform for integration, reporting and analytics (in-memory/olap) and an IDE like SSDS (formerly BIDS).
If tools similar to Talend and Tableau are integrated into SSIS, SSAS, SSRS, DB Engine and SSDT, then Microsoft is one of the best positioned leaders to take Hadoop to a wide audience in their main-stream business. When platforms like Azure Data Market, Data Quality Services, Master Data Management, StreamInsight, Sharepoint etc join hands with tool and technology support integrated with SQL Sever, it would be an unmatched way to extract intelligence out of Hadoop. Connectors for Hadoop has been the first baby step towards this area. Still lot of maturity in this area is awaited.
Till then look out for existing leaders in this area like Cloudera, MapR, Hortonworks, Apache and GreenPlum for Hadoop distributions and implementation. And for Hadoop tools, software vendors like Talend, Tableau, SnapLogic and Pentaho can provide the required toolset.