Siddharth Mehta's Blog

Monday, August 29, 2011

Azure Design Patterns

I have been very busy these days with personal matters, and have not been able to extract time for blogging. I apologize to my readers for the same. Microsoft Azure cloud environment is a rapidly growing cloud platform. With the release of Hadoop connectors for SQL Server, support for Hadoop implementations and Azure Table Storage, Azure provides a reasonable support for unstructured data which is going to be the need of the future. But along with this one more area is growing largely that churns our lot of business and effectively lot of data too - Social Media.

Microsoft has newly released Windows Azure Toolkit for Social Games version 1.0. Integration with social media has become a necessity for almost every other sizable organization today. LinkedIn is one of the best examples where organizations are trying to hook in the social connections of employees for a more intelligent recruitment. Going social is one of the best moves by Azure team.

Azure is classified more from a compute and storage criteria, but from different perspectives of development if would make sense to categorize it into more detailed criteria. When you speak of development, design pattern eventually comes into picture, thou in database world it is used less compared to application world. Buck Woody has put up a very nice website to study these design patterns, and I feel its definitely worth checking out. Its called Azure Design Patterns.

Sunday, August 21, 2011

Using Graph Database on Windows Azure

Tweet this !

Unstructured data is the newest and most vibrant source of data, that organizations want to mine for rich business intelligence. SQL Server and Sharepoint and the front runners from the Microsoft platform in the field of BI. SQL Server being a RDBMS, is not the right choice to contain unstructured data and Sharepoint itself can generate and contain huge volumes of unstructured data. New categories of databases like Document database, Key-Value pairs, Graph databases etc are suited for these purpose and from here starts the territory of NoSQL movement.

Microsoft Azure platform is supporting Hadoop implementations, SQL Server interoperability drivers for Hadoop has also been announced, Microsoft Research is trying to develop project Dryad, which are all movement towards building capabilities to support unstructured data. Graph databases are one of the prominently used database types in the world of unstructured data. Many would argue why relational database cannot be used to achieve the same what graph databases are used for ? Here is one of the answers for the same. Graph databases apply graph theory and once you understand the same you would find the reason why RDBMS cannot cater what Graph Databases can. Neo4j is one of the leaders in this area. Industry leaders like Google also have their own implementation of graph database know as Pregel.

sones GraphDB is one of the graph databases of choice for Microsoft professionals, as it is developed using .NET and is easily supported on Azure platform. Huge volume of unstructured data needs flexible compute and storage platforms like Azure cloud platform, and as it is using .NET framework behind the scenes, it is ideal to be hosted on Azure platform. You can access the technical datasheet from here and below is the architecture diagram of the same. Interestingly this brings a new query language for DB professionals, GraphQL !!

Sunday, August 14, 2011

Unstructured data in SQL Server Denali

Tweet this !

Microsoft seems to be floating support for unstructured data in bits and pieces, which increases its sustainability towards the upcoming challenges posed by BIG and unstructured data. RDBMS is going obsolete gradually, and BI professionals are almost abandoning plain vanilla RDBMS, which I have explained in my latest article. Many would think that RDBMS is a de-facto data container, but world of data is changing with the exponential growth in organizational data. No SQL, CAP Theorem, BASE standard, Distributed databases etc are shaping up a new pandora of unstructured data management and many IT frontiers have already started exploring this part of the database world and harnessing the benefits out of the same.

SQL Server Denali is adding the following new features in the DB Engine to support storage and management of unstructured data:

1) Lots of performance and scale work in Full-Text Search!

2) Customizable NEAR in FTS

3) The ability to search only within document properties instead of the full document

4) Semantic Similarity Search between documents. This provides you the ability to answer questions such as: "Find documents that talk about the same thing as this other document!"

5) Better scalability and performance for FileStream data, including the ability to store the data in multiple containers

6) Full Win 32 application compatibility for unstructured data stored in a new table called FILETABLE. You create a Filetable and can drag and drop your documents into the database and run your favorite Windows applications on them (e.g., Office, Windows Explorer).

These capabilities are definitely steroid level additions for front-end applications to manage unstructured data, but still I feel that Hadoop connectors which would facilitate interoperability between SQL Server and Hadoop is worth more than the entire DB engine. Petabyte scale analytics over structured and unstructured data, is still not a cup of tea for any RDBMS DB Engine. Still these features supporting management of unstructured data in the RDBMS parlance are good value additions. You can learn about these features from this webcast.

Tuesday, August 09, 2011

MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured data

Tweet this !

Not-only SQL (No SQL) is ruling the world of unstructured data for data storage, warehousing and analytics, with Hadoop being the most successful and widely used technology. There are two choices you can make when something is gaining immense acceptance: either you can abandon and keep competing with your own league or you can partner with it and extend your reach deeper. Microsoft is without doubt one of the leaders in database management, data warehousing and analytics apart from IBM, Oracle and Teradata, but on structured data only. Microsoft Research is trying to churn out its own set of products to deal with BIG data and unstructured data challenges, using federated databases capable of MPP. But Hadoop has already earned a proven reputation and acceptance in this world of unstructured data.

The good news is that Microsoft is embracing Hadoop environments slowly and adopting a symbiotic policy. No organizations would have exclusively structured data or exclusive unstructured data, it's always a combination of both. Azure platform is already support Hadoop implementations. Recently Microsoft announced an upcoming CTP release of two new Hadoop connectors for SQL Server as well as Parallel Data Warehouse. Many visionary DW players are already offering a hybrid BI implementation that allows to use MapReduce (used to query data from Hadoop environments) and SQL together. With the release of Hadoop connector for SQL Server, its highly probable that SQL Server becomes a source for Hadoop environments rather than vice-versa as the ocean full of unstructured data sits in Hadoop environments which is nowhere in the reach of SQL Server to accomodate.

Still the interoperability facilitated by this connector, would empower SQL Server to extract data of interest from this ocean of data hosted in Hadoop environments, making MS BI stack even more powerful. Database Engines, ETLs as well as OLAP Engines would see bigger challenges than ever when clients start using Hadoop as a source for SQL Server, but my viewpoint is that it would mostly work other way round. These connectors are opening a door to the possibility where SQL Server based databases as well as data warehouses can/would be used in combination with Hadoop and MapReduce, effectively creating new opportunities for the entire ecosystem of database community from clients to technicians.

Its too early to know the taste of the food before you actually taste it, but you can predict about the taste from the flavor, and that's what I am trying to do as of now. You can read the announcement about these connectors from here.

Sunday, August 07, 2011

Columnar Databases and SQL Server Denali : Marathon towards being world's fastest analytical database

Tweet this !

Have you ever heard of what are columnar databases? You might be wondering this is something new - The answer is No and Yes. Columnstore is not a new technology that has evolved suddenly and is making waves in the database community. It has been in the industry for quite some time. Generally database stored data in the form of records which resides in tables. The storage topology is typically known as rowstore as records are physically stored in a row based format. This methodology has its own advantages with OLTP systems and limitations with OLAP systems. The main advantages of columnstore are better compression, reduced IO during data access and effectively huge gain in data access speeds. Scaling data warehouse computing resources by scaling memory resources and using massively parallel processing does not fit with every business due to budgetary and architecture constraints. Columnstore seems to be a breakthrough technology to play the role of a catalyzer in analyzing enormous amount of data of the scale of billions of records, from enterprise data warehouses.

One of the best examples of columnar database success stories is ParAccel - one of the world's fastest analytical database vendors. Gartner in its latest report, has positioned ParAccel in visionaries category in the magic quadrant. You can get a deeper view on how ParAccel harnesses the power of columnar storage from it's datasheet and a success story.

Microsoft seems to have started its marathon in adding the nitro to SQL Server for adding data access speeds to DBs for OLAP engines. SQL Server Denali is introducing a new feature known as columnstore indexes, know as project Apollo and you can read more about this from here. This is just the first spark in the race of being one of the worlds fastest analytical database, a market into which IBM, GreenPlum, Kognitio, ParAccel and others have already plunged quite some time back. In-memory processing engine like Vertipaq combined with columnstore indexes can yield some blazing speeds in data warehousing environments. Time would tell what is the strategy of Microsoft to incorporate this concept in SQL Server and how SQL Server community reacts to it. Whatever be the case, it's a welcome news for end clients as of now.

Siddharth Mehta's Blog

Monday, August 29, 2011

Azure Design Patterns

Sunday, August 21, 2011

Using Graph Database on Windows Azure

Sunday, August 14, 2011

Unstructured data in SQL Server Denali

Tuesday, August 09, 2011

MS BI and Hadoop Integration using Hadoop Connectors for SQL Server and Parallel Data Warehouse to analyze structured and unstructured data

Sunday, August 07, 2011

Columnar Databases and SQL Server Denali : Marathon towards being world's fastest analytical database

Latest Trends and Technologies

Elasticsearch Resources

Hadoop, BIG Data, and Cloud

Read My Articles

Microsoft Business Intelligence

SQL Server Product Team Blogs

Community

MS BI 2008 Whitepapers

Article Category

MS BI 2008 Video Tutorials

Blog Archive