Sunday, August 14, 2011

Unstructured data in SQL Server Denali

Microsoft seems to be floating support for unstructured data in bits and pieces, which increases its sustainability towards the upcoming challenges posed by BIG and unstructured data. RDBMS is going obsolete gradually, and BI professionals are almost abandoning plain vanilla RDBMS, which I have explained in my latest article. Many would think that RDBMS is a de-facto data container, but world of data is changing with the exponential growth in organizational data. No SQL, CAP Theorem, BASE standard, Distributed databases etc are shaping up a new pandora of unstructured data management and many IT frontiers have already started exploring this part of the database world and harnessing the benefits out of the same.

SQL Server Denali is adding the following new features in the DB Engine to support storage and management of unstructured data:

1) Lots of performance and scale work in Full-Text Search!

2) Customizable NEAR in FTS

3) The ability to search only within document properties instead of the full document

4) Semantic Similarity Search between documents. This provides you the ability to answer questions such as: "Find documents that talk about the same thing as this other document!"

5) Better scalability and performance for FileStream data, including the ability to store the data in multiple containers

6) Full Win 32 application compatibility for unstructured data stored in a new table called FILETABLE. You create a Filetable and can drag and drop your documents into the database and run your favorite Windows applications on them (e.g., Office, Windows Explorer).

These capabilities are definitely steroid level additions for front-end applications to manage unstructured data, but still I feel that Hadoop connectors which would facilitate interoperability between SQL Server and Hadoop is worth more than the entire DB engine. Petabyte scale analytics over structured and unstructured data, is still not a cup of tea for any RDBMS DB Engine. Still these features supporting management of unstructured data in the RDBMS parlance are good value additions. You can learn about these features from this webcast.

1 comment:

john4you said...

Thanks for providing such an useful content. Multi-Tenant Cloud Storage is an ideal solution for Unstructured Data Storage management. Please share more useful thoughts with us.

