Sunday, January 27, 2013

Database Sharding, SQL Azure Federation, SQL Server 2012 and Sequence

I'm reading: Database Sharding, SQL Azure Federation, SQL Server 2012 and SequenceTweet this !
Almost everyone who is a SQL Server professional directly or indirectly knows the newly introduced Sequence object in SQL Server 2012. This post is not about explaining how to use it. Just google what most famous blogger have to say about Sequence object, and you would find that the only content in the post is how to use it. I was hoping that someone would see it from the angle that I see it, but I didn't find any. So I thought of sharing my viewpoint on Sequence object.

One can manage generating incremental keys/ids using the IDENTITY property. Is Sequence just another means to flexibly manage automated incremental ID generation ? I would say it's one of the means, but I don't see it as the ultimate purpose of it.

Applications enabling answer to complex queries often needs data of a huge size. How easily would you get to read data, depends on strategically how-and-where you write data. Network bandwidth, computing capacity and memory availability are some of the driving factors in read-write operations. When apps face performance pressure, the first step that is typically followed is scaling up. After scale up reaches a limit, scale out is the next measure. Partitioning, Replication, Federation are different approaches to scale out databases. If data is distributed and stored in a way that the query is likely to fetch all the data from a single server / partition, this strategy might work till each database size remains in gigabyte sized limits. If each partition / database is multi terabyte sized, even if entire data to be returned to database is figured out, it would face a bottleneck due to network bandwidth available to a single server due to the amount of data to be returned.

Database sharding is partitioning data horizontally across many servers on commodity hardware. Some areas where sharding is used include:

  • NoSQL databases running on Hadoop and MapReduce based infrastructure with database sharding have been able to solve database scalability related challenges on some of the worlds biggest and most aggressively growing datasets. 
  • SQL Azure Federation on Windows Azure platform is one of the awaited features in SQL Azure, that would allow multi tenant databases with sharding. 
  • Cloud databases often have to be multi-tenant / federated, due to the limits on database sizes as a result of cloud based storage architecture.
The interesting part is, database sharding on relational databases is not available out-of-box, at least in SQL Server. Consider a scenario where the need is to insert a couple of records in order but in different databases. One would need a key/id generation mechanism that is independent of tables, unlike IDENTITY property.

Sequence is independent of tables, which makes it much more scalable. From the feature set it seems that Sequence can help to generate a ordered set of keys for storing data across shards in a sorted format. This can be confirmed only with a PoC, but for now I am hopeful that when SQL Azure Federation  would go RTM, Sequence would definitely make a place in the T-SQL features for the same.

If you feel that database sharding for relational OLTP databases is not a popular practice, then you might want to check out this company that just runs its shop on database sharding - DBShards.com. An interesting whitepaper on database sharding can be read from here.

No comments:

Related Posts with Thumbnails