Thursday, February 07, 2013
I'm reading: Building Social Analytics with MS BITweet this !
Every form of analysis needs data, but it's not possible that one might have that data generated and stored in organizational repository. Many forms of analysis depends upon data from third-party, and platforms like Windows Azure Marketplace are based on the same principle.
Social Analytics is widely used to forecast the impact on the business and extract insights to counter the same. The interesting question here is, what is the data source that can be used to calculate / derive sentiments of customers related to the respective business ? A majority of this data would come from social / professional / collaboration forums. Examples of such sources are Facebook, YouTube, Twitter, LinkedIn, PInterest, IMDb, Blogs etc. Anyone would agree that the analytics derived from unstructured data created by the public interaction on social media can be expected to be much more close to precision than even any data mining algorithm. But the big question here is, the amount of data - very very very big data. On a daily basis, there are 400 million Tweets, 2.7 billion Facebook Likes, and 2 billion YouTube views. Even these figures might have been outdated today.
Say an organization is influenced by Sharepoint 2013 enhancements related to social media collaboration, and intends to add an ability to derive sentiment analysis in their client offering. Let's say that as a starting source, Twitter is selected as the source of data, and all the public tweets for a particular product would be analyzed and the results would be stored for future use.
The first challenge is that according to a study, Twitter generates approximately 1 billion tweets in less than 3 days. So how to deal with processing such a huge amount of unstructured data and just consider the kind of infrastructure required to handle this processing. To proceed with the case study, let's say that we live in the age of cloud and we just signed up on AWS and have beefed up a fat Amazon EMR that uses Hadoop and HBase NoSQL database.
The second challenge in this case is how to get access to Twitter Firehose - an API that provides streaming access to Twitter public tweets. One needs to partner with Twitter and pay millions of dollars to get licensed access to it's sea of unfiltered dataset. Also you would need rights to publicly sell this dataset to your end-clients. Considering this complexity any organization would give up the idea of implementing it for own use.
Sometimes the answer to the problem is not technology but it's partner technology. Only three publicly known companies have licensed rights to Twitter's Firehose - Topsy, Datasift, and Gnip. These companies have established partnership with hundreds and thousands of social media platforms, established a web scale and google inspired flavor of infrastructure based on Hadoop clustering methodology, and also have been maintaining a huge archive of historical social data. On the top of it, these providers provide real time access to live stream of social media and also provides social analytics using intelligent methods. An interesting case study of how Datasift manages infrastructure for huge processing, storage and analytics can be read from here.
How MS BI is related to it ?
Even if one selects to sign-up with any of these providers and source analyzed data from them, one would have to keep storing the results. These providers have pay-per-use pricing model depending upon the selected source. After intelligently extracting analyzed data from different sources through these providers, one would have to warehouse the same to avoid paying repeatedly for the same data. Considering the volume of data, even if analyzed data from these social media providers is warehoused, it would easily create a huge warehouse of data.
Microsoft have two different flavors of analysis models (Tabular mode SSAS and OLAP mode SSAS) under the BISM umbrella and a very strong set of end user collaboration platforms including Sharepoint and Excel. Analyzing the warehoused data from social analytics providers with MS BI and including the same in solution offerings can be a deal breaker than implementing complex data mining algorithms or such methods.
I would really like to hear what Microsoft thinks about my idea around social analytics with ms bi. Anyone reading this post is interested in sharing their thoughts about this idea, I would be more than happy to receive the same.