tag:blogger.com,1999:blog-119095842024-02-28T03:24:41.691-08:00Siddharth Mehta's BlogThis blog is home to share my experiences, views, learning and findings on BIG Data, MongoDB, Elasticsearch, Hadoop, D3, SQL Server, SQL Azure, MS BI - SSIS, SSAS, SSRS, MDX, Visual BI methods, Excel Services, Visio Services, PPS, Powerpivot. I am co-author of the SQL Server MVP Deep Dives - Volume 2 and have reviewed several other books. Feel free to involve me in your projects. I would be happy to help. You can contact me @ contactsidmehta@gmail.comSiddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.comBlogger415125tag:blogger.com,1999:blog-11909584.post-62751310091109935172016-09-11T00:17:00.001-07:002016-09-11T00:17:58.057-07:00SSAS Online Training - SQL Server Analysis Services, Data Mining and Analytics<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: "verdana" , sans-serif;">I have published an online course on SQL Server Analysis Services 2016, Data Mining and Analytics. Below is a promo video of the same. You can enroll in this course by clicking on the below link. Any suggestions / feedback are welcome.</span><br />
<span style="font-family: "verdana" , sans-serif;"><br /></span>
<span style="font-family: "verdana" , sans-serif;"><a href="https://www.udemy.com/ssas-sql-server-analysis-services-2016-mdx-training/?couponCode=SIDBLOG75" target="_blank">https://www.udemy.com/ssas-sql-server-analysis-services-2016-mdx-training/?couponCode=SIDBLOG75</a></span><br />
<div style="text-align: center;">
<br /></div>
<div style="text-align: center;">
<iframe allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/3g_W2z_alfw" width="560"></iframe>
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-20572665666689705832016-07-20T10:19:00.001-07:002016-07-20T10:19:31.106-07:00Fast Track SSAS and MDX Training using SQL Server 2016<div dir="ltr" style="text-align: left;" trbidi="on">
<span style="font-family: Verdana, sans-serif;">It's a long time since I blogged, as I have been very busy with my authoring assignments and my regular day job.</span><br />
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;">I have created a course to learn SQL Server Analysis Services ( SSAS ) and MDX on fast track using SQL Server 2016.</span><br />
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;">In case you would like to subscribe to the course, here's the link:</span><br />
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;">https://www.udemy.com/ssas-sql-server-analysis-services-2016-mdx-training/?couponCode=PROMO50</span><br />
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;">By using this link, my blog readers can avail 50% OFF on the course price till end of July. I hope you find the course useful.</span></div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-8970411908324840542014-06-16T11:36:00.001-07:002014-06-16T14:41:09.593-07:00SQL Server vs MongoDB vs MySQL<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Microsoft SQL Server is one of the mainstream databases used in most operational systems built using Microsoft technology stack. One of the biggest shortcoming is the inability to support horizontal scaling / sharding. So the next logical choices that are most nearest to SQL Server would be MySQL.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Verdana, sans-serif;">In case you are looking for horizontal scaling / sharding, that would mean that you are gearing up to deal with Big Data. MongoDB is the arguably the first logical step in NoSQL world, in case if someone is considering to experiment with NoSQL to handle BigData.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: justify;">
<span style="font-family: Verdana, sans-serif;">At the stage, one is faced with the requirement to compare all these databases. Below is a quick comparison of these databases, with limitations highlighted in red and product strengths in blue.</span></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCxNAQYD7hj938T3EbmdA7Mw3Ohjfx4s7N23C9nn7Lv9It2dCtl27_ToDGi3CWVObC8ojX-6uvHXp7-cEtyqgTO-Mo57Xg_O5Mm_wEl8FPZVWDT6dH5Z-XUIF1UKmsqOxqdirVDg/s1600/Compare.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCxNAQYD7hj938T3EbmdA7Mw3Ohjfx4s7N23C9nn7Lv9It2dCtl27_ToDGi3CWVObC8ojX-6uvHXp7-cEtyqgTO-Mo57Xg_O5Mm_wEl8FPZVWDT6dH5Z-XUIF1UKmsqOxqdirVDg/s1600/Compare.png" /></a></div>
<br />
<div style="text-align: right;">
<span style="font-family: Verdana, sans-serif; font-size: x-small;">Reference: DB-Engines.com</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com1tag:blogger.com,1999:blog-11909584.post-53727760183800614722014-06-14T15:05:00.001-07:002014-06-14T15:05:32.017-07:00Elasticsearch vs Solr vs Endeca vs Sharepoint FAST vs Google Search Appliance ( GSA ) vs Autonomy vs Semaphore<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Enterprise Search is a huge market. Fortunately there are just a handful of products out there to cater this business and unfortunately there is no one-product-fits-all kind of product out there.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">There are specific category of features expected from an enterprise search product, which makes it suitable for one or other requirements. Some of them are listed as below:</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>1) Crawling</b></span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Web Crawling: An enterprise has most of the content on portals in the form of html and media documents. A crawler is the basic means to create an index out of this content.</span></li>
</ul>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">DB Crawling: Data stored in databases often needs to be crawled or imported into the search inventory.</span></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>2) Taxonomy</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Taxonomy is the logical organization of content in the enterprise content management system. Some term it as metadata or structure or term stores of the index maintained in the system. It's the method of framing structure around the content, so that information can be retrieved more effectively and precisely.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">For example, a very simple way of implementing taxonomy can be the ability to tag content using a set of keywords defined centrally at the organization level.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>3) Specialized OOB Search</b></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Faceted search (like the ones when you use Amazon and a set of categories appear of the left side)</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Dictionary based search (where you look for a word and its synonyms)</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Auto-suggest (for example when you type terms in google and it suggest few phrases)</span></li>
</ul>
</div>
<span style="font-family: Verdana, sans-serif; text-align: justify;"><b>4) Plugability </b></span><br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Ability to index SMTP server</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Ability to index LDAP server</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Out-of-box ability to index any such external systems</span></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Systems like Google Search Appliance, <a href="http://www.oracle.com/us/solutions/business-analytics/business-intelligence/endeca/overview/index.html" target="_blank">Oracle Endeca</a>, <a href="http://www.autonomy.com/" target="_blank">HP Autonomy</a>, <a href="http://msdn.microsoft.com/en-us/library/office/jj163300(v=office.15).aspx" target="_blank">Microsoft Sharepoint search</a>, and <a href="http://lucene.apache.org/solr/" target="_blank">Solr</a> are the top leaders in this category. Products like <a href="http://www.smartlogic.com/home/products/products-overview" target="_blank">Smartlogic Semaphore</a> add a value added layer on the top of it.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">But the big question is where does products like <a href="http://www.elasticsearch.org/" target="_blank">Elasticsearch</a> fit here ? </span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">While we looked at the positives of these products due to their ability to provide the above mentioned features, there are some downsides / limitations too, where Elasticsearch or even Solr steps in.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Any of these products are not economic. For example, HP Autonomy is heard to have the base price of more than half a million dollars. Every enterprise may not have the budget to afford it.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Some products do not support database indexing easily. For example GSA does not allow to use complex delta detection based queries for indexing data from databases easily.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) Most of these products are not scalable horizontally. Apart from appliance solutions, products like endeca are resource intensive and not suitable for managing big data kind of volumes due to their scalability architecture.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">4) Custom development for extending the product using APIs is not as easy as compared to open source products.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Custom search for applications is inevitable. Though the enterprise search platform may be dominated by these products, but for empowering custom applications that manage big data using specialized search functionality (for example ecommerce sites like amazon.com and others), products like elasticsearch and solr would continue to find its space.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The limitations with products like Elasticsearch is that it lacks the enterprise scale features for example OOB Crawlers, Information Visualization and Reporting layers required for e-discovery and reporting, and very limited taxonomy which is very crucial for an enterprise search platform. But as the product is still very young and evolving, these features can be expected hopefully over the couple of years.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com1tag:blogger.com,1999:blog-11909584.post-40500941220607370762014-06-12T08:58:00.000-07:002014-06-12T08:58:34.324-07:00Elasticsearch with .NET : NEST Library Code Example<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch can be used with a number of programming languages, one of it being Microsoft .NET. <a href="https://github.com/elasticsearch/elasticsearch-net" target="_blank">Elasticsearch.NET</a> (low level client) and <a href="http://nest.azurewebsites.net/" target="_blank">NEST</a> (high level client).</span> </div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">
</span>
<span style="font-family: Verdana, sans-serif;">NEST comes with a strongly typed wrapper around Elasticsearch.NET API, and allows for a fully object oriented programming approach to interface with Elasticsearch. It also has nice documentation to learn the APIs.</span> </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The first program that I would want to generally write, is to index a structured document into elasticsearch using C# code and NEST APIs. One only needs any version of Visual Studio and <a href="http://www.nuget.org/packages/NEST.Signed/" target="_blank">NEST Nugget package</a> installed. Below is the very first console application I wrote to test the .NET integration with Elasticsearch.</span>
<span style="font-family: Verdana, sans-serif;">Let me know whether you liked the code, whether it worked for you, and in case if you need any help with programming.</span>
</div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<br />
<pre class="brush: csharp">using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Nest;
using Nest.Domain.Connection;
namespace ESConsole
{
class Program
{
static void Main(string[] args)
{
var uri = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(uri).SetDefaultIndex("contacts");
var client = new ElasticClient(settings);
if (client.Health(HealthLevel.Cluster).ConnectionStatus.Success)
{
Console.WriteLine("Connection Successful");
if (client.IndexExists("contacts").Exists)
{
Console.WriteLine("Index Exists");
Program.UpsertArticle(client, new Article("The Last Airbender", "Siddharth"), "blog", "article", 1);
Program.UpsertContact(client, new Contacts("Siddharth Mehta", "India"), "contacts", "contacts", 2);
Console.WriteLine("Data Indexed Successfully");
}
else
{
Console.WriteLine("Index Does Not Exist");
}
}
else
{
Console.Write("Connection Failed");
}
Console.ReadKey();
}
public class Article
{
public string title { get; set; }
public string artist { get; set; }
public Article(string Title, string Artist)
{
title = Title; artist = Artist;
}
}
public class Contacts
{
public string name { get; set; }
public string country { get; set; }
public Contacts(string Name, string Country)
{
name = Name; country = Country;
}
}
public static void UpsertArticle(ElasticClient client, Article article, string index, string type, int id)
{
var RecordInserted = client.Index(article, index, type, id).Id;
if (RecordInserted.ToString() != "")
{
Console.WriteLine("Transaction Successful !");
}
else
{
Console.WriteLine("Transaction Failed");
}
}
public static void UpsertContact(ElasticClient client, Contacts contact, string index, string type, int id)
{
var RecordInserted = client.Index(contact, index, type, id).Id;
if (RecordInserted.ToString() != "")
{
Console.WriteLine("Transaction Successful !");
}
else
{
Console.WriteLine("Transaction Failed");
}
}
}
}
</pre>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com4tag:blogger.com,1999:blog-11909584.post-58785955251290089292014-06-09T14:45:00.000-07:002014-06-12T07:42:56.223-07:00Elasticsearch with SQL Server<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch is a very powerful value addition to any relational dbms like SQL Server, Oracle, DB2 etc, provided it's used wisely. Before we look at how to use elasticsearch with SQL Server, we should look at "Why to use elasticsearch with SQL Server". This question holds the key to the answer.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">SQL Server hold data either in relational form or in multi-dimensional form (through SSAS). Full Text Search (FTS) in SQL Server is capable of providing some out-of-box search feature, but when search queries requires exhaustive searching over huge datasets, and add some complexity in the search definition itself, one can evidently see performance impact there. Elasticsearch is primarily a search engine, but loaded with features like Facets and Aggregation framework, it helps solve many data analysis related problems. For example, everyone of us would have visited sites like Amazon.com, Ebay.com, Flipkart.com etc. Whenever we search for a product, it builds all the dynamic categories, ranges and values on the fly. For such features, a product like elasticsearch can be extremely helpful. One such real project example can be read from <a href="http://www.calcey.com/calcey/high-performance-search-via-elasticsearch/" target="_blank">here</a>.</span></div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.calcey.com/wp-content/uploads/2012/08/new-diagram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.calcey.com/wp-content/uploads/2012/08/new-diagram.png" /></a></div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;"><b>How to use Elasticsearch with SQL Server ?</b></span><br />
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><a href="https://github.com/jprante/elasticsearch-river-jdbc" target="_blank">Elasticsearch JDBC River</a> is the best means (to the best of my knowledge as of date) to load data from SQL Server into an elasticsearch index. One of the best explanations on setting up elasticsearch JDBC river with SQL Server, can be read from <a href="http://nitschinger.at/Elastic-Search-and-SQL-Server-are-sitting-in-a-tree" target="_blank">here</a>.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">One point to keep in view is that, if you setup a river and you restart elasticsearch server, the river would execute the query set for the river again. This could result in reloading of the entire data in the index. In case if the IDs are being fetched from the source, all existing records would get updates. But if IDs are autogenerated in elasticsearch, this would result in new records, which would ultimately lead to duplicate data. So use the river cautiously. You can also delete the river once data is loaded into the index, in case its a one time activity for one time data migration.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com2tag:blogger.com,1999:blog-11909584.post-83279284006850922082014-06-05T15:00:00.000-07:002014-06-05T15:00:42.203-07:00Elasticsearch Architecture Design - Considerations for using Elasticsearch in custom solutions<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch has it's own use cases for which one may want to consider the same as one of the technology stack in the solution. While considering elasticsearch in the scope of the solution, there are certain aspects of the product which should be kept in view from a solution design perspective.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Below is a list of some of the most important architecture design considerations from my perspective.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Elasticsearch is a document oriented database system. Document oriented stores are useful where the need is to have a scalable system and queries are driven by content of the data and not by the key which is the case in key-value stores.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Elasticsearch stores data in the form of JSON documents. JSON has limited set of datatypes. So if the need is to have a very strongly and uniquely typed system, one should consider the data migration or storage strategy wisely.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) Elasticsearch is a database system, that exposes REST based APIs for data as well as server administration. This means that elasticsearch can be seen as a DB server as well as a Web server. So when the infrastructure landscape is being designed, one should carefully consider whether to place elasticsearch in web zone or in db zone.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">4) Elasticsearch does not ship with any authentication module as of date. There are some community plugins available though. Keeping this point in view, one would want to keep elasticsearch behind a web application and not expose it directly over internet.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">5) Elasticsearch does not ship with any GUI tools or editors for development purposes. For the same, elasticsearch has a concept of plugins, which can be developed using Elasticsearch APIs. A huge number of web based plugins for elasticsearch are available on GitHub, which can be easily installed and used for development and data analysis purposes.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">6) Elasticsearch front-end programming wrappers are quite popular among developers as they provide the familiar syntactical support and eliminating the need for developers to learn elasticsearch API. For example, NEST is a .NET wrapper on the top of </span><span style="font-family: Verdana, sans-serif;">elasticsearch API. The risk with such wrapper frameworks is that they should continuously update their API to be compliant with elasticsearch api. Elasticsearch has a very aggressive release schedule and typically there are minimum 3-4 releases of the product every year, some of which also has breaking changes.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">7) Elasticsearch like many other NoSQL products has the characteristics of default behavior and automated data management, until it is explicitly overridden. For example, if a document is inserted with Id - 1, and then if again the same document is inserted in the same index and tyep, it won't raise an exception like relational databases do. It would silently update the document and increment the version number of the document. Another example, elasticsearch would manage any document on any shard which may reside on any node. So until and unless, a specific node is asked, any document can end up on any machine, node and shard.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">8) Elasticsearch is a Multi Version Concurrency Control (MVCC) system. This means that any document is never updated actually. Whenever an update request is received, elasticsearch inserts a new record with an increment version number. Though one can configure purging of the older versions, this characteristic of the system can result in high storage requirements, if there are huge and frequent updates on the system.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">9) Elasticsearch has a concept of "rivers" for integrating elasticsearch with external systems like CouchDB, Twitter etc. But out-of-box, it does not support any rivers to import data from elasticsearch in SQL Server, Oracle, MySQL, DB2 and other relational databases. There are community plugins like JDBC River which can help in one time imports of data. But for continued extraction and loading of data into elasticsearch, a distributed ETL system like Apache Kafka or Twitter Storm would be advisable.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">10) Elasticsearch has some very rigid settings related to data management, which should be taken care even before creating the system. For example, once an index is created, by default it is configured with 5 primary shards. Once the shards are created, throughout the life of the index, the number of shards can never be changed at all. So in case you were planning for 50 million records that you would manage with 5 shards, and say the requirements changed and you now need to load 150 million, still you would have to manage 150 million records with 5 shards only.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">These are some of the initial set of considerations to keep in view while fitting elasticsearch in your solution. In the time to come, I would share another part of this architecture design article, with more points to consider.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com3tag:blogger.com,1999:blog-11909584.post-3505963743833927912014-06-01T14:30:00.001-07:002014-06-01T14:30:49.921-07:00Elasticsearch Tutorial - Elasticsearch Storage Architecture : Analysis and Inverted Indexes<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch use the Apache Lucene engine for almost all of its operations. One of the primary differences between relational databases and NoSQL systems is the way it stores data. When it comes to the storage architecture of elasticsearch, there are two terms which are key to the storage mechanism - Analysis process and Inverted Indexes.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>What is Analysis process in elasticsearch ? </b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">In Part 1, I already explained what's a tokenizer and filter in elasticsearch. Whenever an index is created, a default mapping and analyzer would be attached to it. Depending on the config of the analyzer, a tokenizer and filter would be configured for the same.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">When a document request for indexing is received by elasticsearch, which in turn is handled by lucene, it converts the document in a stream of tokens. After tokens are generated, the same gets filtered by the configured filter. This entire process is called the analysis process, and is applied on every document that gets indexed.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Below is an example of the analysis process. Consider an html tag with embedded sentence as the document as the input. When the same passes through a set of filters and tokenizers, it gets converted into a set of tokens, which finally gets indexed.</span></div>
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQyikPE866LhJZXQmjqXKjIivTuXqoOLhzLytVRyQpVc5sVdh9F7FDEc5mVHIcImsMkKHyuOnEVZMi15VDKWu6ePEPUoUkWabBlxM8HmCB4jS9COR0GlngvPKmUece7jr31y3OWg/s1600/Analysis.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQyikPE866LhJZXQmjqXKjIivTuXqoOLhzLytVRyQpVc5sVdh9F7FDEc5mVHIcImsMkKHyuOnEVZMi15VDKWu6ePEPUoUkWabBlxM8HmCB4jS9COR0GlngvPKmUece7jr31y3OWg/s1600/Analysis.png" /></a></div>
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span>
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>What is the storage data structure of elasticsearch ? <span style="color: #cc0000;">Inverted Indexes</span></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b><span style="color: #cc0000;"><br /></span></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">In SQL Server, we have a binary tree as the data structure for an index, for example. Post the analysis process, when the data is converted into tokens, these tokens are stored into an internal structure called inverted index. This structure maps each unique term in an index to a document. This data structure allows for faster data search and text analytics. All the attributes like term count, term position and other such attributes are associated with the term. Below is a sample visualization of how an inverted index may look like.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Post the tokens are mapped, document is stored on the disk. One can choose to store the original input of the document along with the analyzed document. The original input gets stored in a system field names "_source". Once can even choose to not analyze the input, and store the document without any analysis. The structure of the inverted index totally depends upon the analyzer chosen for indexing.</span></div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Rqt3r3-I6p2YOQKxgXQO2I2DP5880tA7OgvCCzj1vyF4Tm6cR5pZT-6mRjEws9ikRL84oytEoUtQuRXP8sYGq-SmYjmUV11C2h7Npbc0o6-KHH23SwNAEdtjacy-PmyNmvFX8A/s1600/InvertedIndex.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0Rqt3r3-I6p2YOQKxgXQO2I2DP5880tA7OgvCCzj1vyF4Tm6cR5pZT-6mRjEws9ikRL84oytEoUtQuRXP8sYGq-SmYjmUV11C2h7Npbc0o6-KHH23SwNAEdtjacy-PmyNmvFX8A/s1600/InvertedIndex.png" /></a></div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>Summary: </b>One thing to learn from this is that the key to an efficient storage and retrieval process is the analysis process defined on the index, as per the application needs.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com1tag:blogger.com,1999:blog-11909584.post-82086474026268487782014-05-31T06:35:00.001-07:002014-05-31T06:55:56.655-07:00Elasticsearch Tutorial - Questions - Download Elasticsearch GUI Tools - Part 2<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch has a very simple installation mechanism. It requires JVM installed on the host OS, and execute elasticsearch.bat file to kick start the same. To consider integrating products like elasticsearch, there are often requirements where front-end tools are required. Some of such requirements are mentioned below:</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) How to import data in bulk from existing data repositories like Excel files, SQL Server, Oracle, MySQL, DB2, MongoDB, and others.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) How to visually explore data stored in elasticsearch using GUI tools ?</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) How to equip support and monitoring teams with required tools for their day to day operations ?</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">4) How to equip analyst with tools for executing ad-hoc search queries on data stored in elasticsearch ?</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Elasticsearch has a mechanism to support interoperability using a feature called plugins. Plugins can be installed using a simple plugin command. </span><span style="font-family: Verdana, sans-serif;">Elasticsearch supports a number of plugins and a huge number of plugins are supported by community. An exhaustive list of such plugins are listed <a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/modules-plugins.html" target="_blank">here</a>.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">If you are new to Elasticsearch, and just setting up your development environment, below is the list of some of the plugins that you might particularly find useful to speed up your development process.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>1) <a href="https://github.com/jettro/elasticsearch-gui" target="_blank">Elasticsearch GUI</a></b> - A web based elasticsearch administration console written in AngularJS.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>2) <a href="https://github.com/mobz/elasticsearch-head" target="_blank">Elastichead</a> </b>- A web based front-end for elasticsearch, that lets you browse data in a tabular format, provides interface to see metadata, and lets your fire ad-hoc queries.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>3) <a href="https://github.com/royrusso/elasticsearch-HQ" target="_blank">Elasticsearch HQ</a> </b>- A web based elasticsearch monitoring and management console for instances and clusters.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>4) <a href="https://github.com/lukas-vlcek/bigdesk" target="_blank">Bigdesk</a> </b>- A web based elasticsearch plugin that allows to monitor a huge list of performance counters using charts and graphs.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>5) <a href="https://github.com/polyfractal/elasticsearch-segmentspy" target="_blank">Elasticsearch segmentspy</a> </b>- A web based elasticsearch plugin that specializes in monitoring segment relates features like merges, additions, deletes etc.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>6) <a href="https://github.com/xyu/elasticsearch-whatson" target="_blank">Elasticsearch whatson</a> </b>- </span><span style="font-family: Verdana, sans-serif;">A web based elasticsearch plugin that specializes in providing comparative analysis of data stored across indices, shards, nodes and cluster.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>7) <a href="https://github.com/dadoonet/fsriver" target="_blank">Elasticsearch FS River</a> </b>- </span><span style="font-family: Verdana, sans-serif;">Elasticsearch plugin to bulk import</span><span style="font-family: Verdana, sans-serif;"> content. Though this plugin has got a few bugs open, but still for a one time bulk import it is very useful. With some fixes and workarounds, this plugin can be used to warehouse huge amount of context into elasticsearch.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;"><b>8) <a href="http://www.elasticsearch.org/overview/marvel/" target="_blank">Marvel</a></b> - A commerical monitoring and analytics tool from elasticsearch.</span><br />
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>9) <a href="https://github.com/jprante/elasticsearch-river-jdbc" target="_blank">Elasticsearch JDBC River</a></b> - Elasticsearch plugin to bulk import data from variety of systems into elasticsearch. If wisely used, this is the most useful plugin to start pumping data into elasticsearch.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://github.com/jprante/elasticsearch-river-jdbc/raw/master/src/site/resources/tabular-json-data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://github.com/jprante/elasticsearch-river-jdbc/raw/master/src/site/resources/tabular-json-data.png" /></a></div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com1tag:blogger.com,1999:blog-11909584.post-14739819851378706852014-05-19T08:57:00.001-07:002014-05-19T08:57:40.577-07:00ElasticSearch Tutorial - Questions - Basics - Part I<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) <a href="http://www.elasticsearch.org/" target="_blank">ElasticSearch</a> uses <a href="http://lucene.apache.org/" target="_blank">Apache Lucene</a> as the underlying technology.</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Relational databases maps the values of fields in a table to indexes. During search operation indexes are used to locate records. Lucene uses inverted indexes that stores values (terms) in a field, which are used to find related records (documents).</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="color: #cc0000; font-family: Verdana, sans-serif;"><b>Terminologies:</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>3) What is an index in ElasticSearch ? </b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">An index is similar to a table in relational databases. The difference is that relational databases would store actual values, which is optional in ElasticSearch. An index can store actual and/or analyzed values in an index.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>4) What is a document in ElasticSearch ? </b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">A document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Each field can occur multiple times in a document with different data types. Fields can contain other documents too.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>5) Does ElasticSearch have a schema ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Yes, ElasticSeach can have mappings which can be used to enforce schema on documents.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>6) What is a document type in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>7) What is indexing in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update is attempted, a new version of the document is written to the index.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>8) What is a node in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Each instance of ElasticSearch is called a node. Multiple nodes can work in harmony to form an ElasticSearch Cluster.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>9) What is a shard in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shards.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>10) What is a replica in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>11) What is an Analyzer in ElasticSearch ?</b></span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is built of tokenizer and filters. Following types of Analyzers are available in ElasticSearch 1.10.</span></div>
<span style="font-family: Verdana, sans-serif;">
</span>
<ul style="text-align: left;">
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html" style="box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase;"><span style="color: blue; font-family: Verdana, sans-serif;">standard analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">simple analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-whitespace-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">whitespace analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">stop analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">keyword analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">pattern analyzer</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">language analyzers</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue; font-family: Verdana, sans-serif;">snowball analyzer</span></a></li>
<li style="text-align: justify;"><span style="box-sizing: border-box; color: blue; font-family: Verdana, sans-serif; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html" style="box-sizing: border-box; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase;">custom analyzer</a></span></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>12) What is a Tokenizer in ElasticSearch ?</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">A Tokenizer breakdown fields values of a document into a stream, and inverted indexes are created and updates using these values, and these stream of values are stored in the document.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>13) What is a Filter in ElasticSearch ?</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">After data is processed by Tokenizer, the same is processed by Filter, before indexing. </span><span style="font-family: Verdana, sans-serif;">Following types of Filters are available in ElasticSearch 1.10.</span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-and-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">and filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">bool filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-exists-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">exists filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-bounding-box-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geo bounding box filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geo distance filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-range-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geo distance range filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-polygon-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geo polygon filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geoshape filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geohash-cell-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">geohash cell filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">has child filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">has parent filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-ids-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">ids filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-indices-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">indices filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">limit filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-all-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">match all filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-missing-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">missing filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-nested-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">nested filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-not-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">not filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-or-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">or filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-prefix-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">prefix filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">query filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-range-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">range filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-regexp-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">regexp filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">script filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">term filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">terms filter</span></a></li>
<li style="text-align: justify;"><a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-type-filter.html" style="-webkit-transition: color 0.1s ease-in; box-sizing: border-box; font-family: Gibson-Regular, Arial; font-size: 18px; line-height: 24px; outline: none; text-decoration: none; text-transform: lowercase; transition: color 0.1s ease-in;"><span style="color: blue;">type filter</span></a></li>
</ul>
</div>
<span style="font-family: Verdana, sans-serif; text-align: justify;"><b>14) What is the query language of ElasticSearch ?</b></span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">ElasticSearch uses the Apache Lucene query language, which is called Query DSL.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">In the next part of ElasticSearch Tutorial, we would see how to install ElasticSearch, and use ElasticSearch tools and technologies to administer the same.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-41197269539923150432014-05-18T11:44:00.001-07:002014-06-12T07:01:06.912-07:00World Famous Architectures : Facebook, WhatsApp, Amazon, Twitter, YouTube, Google, ESPN, Salesforce, FarmVille and other world famous architectures<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif; font-size: small;"><span style="font-weight: normal;">Experience is the biggest teacher, and no books or coaches can be a better teacher than learning from experience. We often hear from various sources in the professional world around us, regarding different architecture designs and practices, and still most of us would have inevitably attended a performance optimization training at least once in the past 2-3 years.</span></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif; font-size: small;"><span style="font-weight: normal;"><br /></span></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">In my opinion, if you want to really learn scalability and performance, just take a look at the below mentioned top architectures of the world. I bet, if you can follow and implement even any two of them to the extent they have been by these organizations, you are set to build a new world famous architecture.</span><br />
<span style="font-family: Verdana, sans-serif; text-align: left;"><br /></span>
<span style="font-family: Verdana, sans-serif; text-align: left;">1) </span><a href="http://highscalability.com/blog/2014/3/31/how-whatsapp-grew-to-nearly-500-million-users-11000-cores-an.html" style="font-family: Verdana, sans-serif; text-align: left;" target="_blank">WhatsApp Architecture</a></div>
<div style="text-align: left;">
<span style="font-family: Verdana, sans-serif;"><br /></span>
<span style="font-family: Verdana, sans-serif;">2) <a href="http://highscalability.com/flickr-architecture" target="_blank">Flickr Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">3) <a href="http://highscalability.com/amazon-architecture" target="_blank">Amazon Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">4) <a href="http://highscalability.com/blog/2009/8/5/stack-overflow-architecture.html" target="_blank">Stack Overflow Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">5) <a href="http://highscalability.com/google-architecture" target="_blank">Google Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">6) <a href="http://highscalability.com/youtube-architecture" target="_blank">YouTube Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">7) <a href="http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html" target="_blank">Pinterest Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">8) <a href="http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html" target="_blank">Twitter Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">9) <a href="http://highscalability.com/blog/2011/12/6/instagram-architecture-14-million-users-terabytes-of-photos.html" target="_blank">Instagram Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">10) <a href="https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851590_229753833859617_1129962605_n.pdf" target="_blank">Facebook Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><span id="goog_1888906966"></span>11) <a href="http://highscalability.com/blog/2011/8/18/paper-the-akamai-network-61000-servers-1000-networks-70-coun.html" target="_blank">Akamai Network Architecture with 100K server<span id="goog_1888906967"></span>s</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">12) <a href="http://highscalability.com/blog/2011/6/27/tripadvisor-architecture-40m-visitors-200m-dynamic-page-view.html" target="_blank">TripAdvisor Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">13) <a href="http://highscalability.com/blog/2012/7/30/prismatic-architecture-using-machine-learning-on-social-netw.html" target="_blank">Prismatic Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">14) <a href="http://highscalability.com/blog/2013/9/23/salesforce-architecture-how-they-handle-13-billion-transacti.html" target="_blank">Salesforce Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">15) <a href="http://highscalability.com/blog/2013/6/18/scaling-mailbox-from-0-to-one-million-users-in-6-weeks-and-1.html" target="_blank">Mailbox Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">16) <a href="http://highscalability.com/blog/2012/7/16/cinchcast-architecture-producing-1500-hours-of-audio-every-d.html" target="_blank">Cinchcast Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">17) <a href="http://highscalability.com/blog/2012/6/25/stubhub-architecture-the-surprising-complexity-behind-the-wo.html" target="_blank">Stubhub Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">18) <a href="http://highscalability.com/blog/2013/11/4/espns-architecture-at-scale-operating-at-100000-duh-nuh-nuhs.html" target="_blank">ESPN Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">19) <a href="http://highscalability.com/blog/2014/2/17/how-the-aolcom-architecture-evolved-to-99999-availability-8.html" target="_blank">AOL Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">20) <a href="http://highscalability.com/blog/2011/12/12/netflix-developing-deploying-and-supporting-software-accordi.html" target="_blank">Netflix Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">21) <a href="http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html" target="_blank">DataSift Architecture</a></span><br />
<ul style="text-align: left;">
<li><span style="font-family: Verdana, sans-serif;"><a href="http://wp.zenkay.net.s3.amazonaws.com/2014/02/datasift_infrastructure.png" target="_blank">DataSift Big Data Architecture Diagram</a></span></li>
<li><span style="font-family: Verdana, sans-serif;"><a href="http://blog.andreamostosi.name/2014/02/how-it-works-datasift/" target="_blank">Big Data Architecture Details</a></span></li>
</ul>
</div>
<div>
<span style="font-family: Verdana, sans-serif;">22) </span><a href="http://highscalability.com/blog/2010/3/16/justintvs-live-video-broadcasting-architecture.html" style="font-family: Verdana, sans-serif;" target="_blank">Justin.tv Architecture</a></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">23) <a href="http://highscalability.com/blog/2010/9/21/playfishs-social-gaming-architecture-50-million-monthly-user.html" target="_blank">Playfish Architecture</a></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;">24) <a href="http://highscalability.com/blog/2010/2/8/how-farmville-scales-to-harvest-75-million-players-a-month.html" target="_blank">FarmVille Architecture</a></span><br />
<br />
<span style="font-family: Verdana, sans-serif;">25) <a href="http://blog.andreamostosi.name/2013/11/how-it-works-klout/" target="_blank">Klout Architecture</a></span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-73271205284947139622014-05-18T00:52:00.000-07:002014-05-18T00:52:37.313-07:00Chef for Microsoft Azure, Amazon, OpenStack, Rackspace, Google Compute Engine, or Linode<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">What is DevOps ? For beginner, who are not aware of what is DevOps, can read this <a href="http://www.getchef.com/solutions/devops/" target="_blank">page</a> to gain an idea on the same.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">DevOps is a branch of architecture design, that is often considered trivial by many architects or development leads. For traditional applications, infrastructure provisioning, capacity management, monitoring, and operations support generally gets taken care by dedicated IT teams bound by pre-agreed SLAs.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">But when architects are dealing with cloud scale applications, devops is no longer a trivial area or outside of the solution definition. Automating infrastructure management using script based templates, on all the major cloud vendors, is one of the standard industry practices and supported by almost all the cloud vendors as well. It came to my surprise when I found one of the Microsoft Azure cloud trainers not aware of what is devops automation, and I had to educate the trainer on the same.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Some of the major players in this area are as mentioned below, and <a href="http://docs.opscode.com/chef_overview.html" target="_blank">Chef</a> is leading the way in this area.</span></div>
<div>
<ul style="box-sizing: border-box; color: #333333; font-size: 16px; line-height: 20px;">
<li style="box-sizing: border-box; text-align: justify;"><a href="http://www.vagrantup.com/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Vagrant</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://wiki.opscode.com/display/chef/Home" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Chef</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://red-badger.com/blog/2013/06/29/ansible/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Ansible</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="https://github.com/capistrano/capistrano" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Capistrano</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="https://travis-ci.org/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Travis CI</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://jenkins-ci.org/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">Jenkins</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://www.jetbrains.com/teamcity/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">TeamCity</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://nodetime.com/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">NodeTime</span></a></li>
<li style="box-sizing: border-box; text-align: justify;"><a href="http://newrelic.com/" style="box-sizing: border-box; color: #005952;"><span style="font-family: Verdana, sans-serif;">New Relic</span></a></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Vagrant still stands a choice for VMWare lovers, but Chef is much more sophisticated compared to Vagrant. A good place for beginners is to start getting an overview of Chef, and pursuing some free webinars and </span><a href="http://www.getchef.com/chef/" style="font-family: Verdana, sans-serif;" target="_blank">free trainings</a><span style="font-family: Verdana, sans-serif;"> provided by Chef. </span><span style="font-family: Verdana, sans-serif;">Chef in itself is a comprehensive framework with concepts like Knife, Cookbook, Chef-repo, Ohai etc. A picture is worth thousand words.Below mentioned is the architecture diagram of Chef.</span></div>
<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://docs.opscode.com/_images/overview_chef_draft.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://docs.opscode.com/_images/overview_chef_draft.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: Verdana, sans-serif; font-size: small;"><b>Chef Architecture</b></span></td></tr>
</tbody></table>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Picking up a cloud automation vendor is not the end of devops. Often huge businesses have hybrid infrastructure environments formed of private datacenter, physical and virtual environments, multi-tenant cloud environments. Companies like <a href="http://www.rightscale.com/products-and-services/products" target="_blank">RightScale</a> too offer specialized solutions to deal with such use-cases. Below is an interesting architecture diagram of righscale solution model.</span></div>
<span style="font-family: Verdana, sans-serif;"><br /></span>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://assets.rightscale.com/6116128b0b81a1b0692fa4b9178096b3517b1c58/web/images/illustrations/hybrid-cloud1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="http://assets.rightscale.com/6116128b0b81a1b0692fa4b9178096b3517b1c58/web/images/illustrations/hybrid-cloud1.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-family: Verdana, sans-serif; font-size: small;"><b><br />RightScale MultiCloud Platform</b></span></td></tr>
</tbody></table>
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-44317170460275461552014-05-16T10:30:00.000-07:002014-05-16T10:30:43.192-07:00How to drive a project on NoSQL, Big data, Elasticsearch, MongoDB, Hadoop, and other such technologies<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">I am authoring this blog after quite a long break from blogging. Once one gets married, promoted in the organization at the same time, and made responsible for more than 20+ projects as the Lead Architect for a portfolio, </span><span style="font-family: Verdana, sans-serif;">it's not easy to catch up with blogging. </span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">These days, I work on projects s</span><span style="font-family: Verdana, sans-serif;">panning technologies like Sharepoint 2013, .NET, jQuery, SQL Server, SSIS, SSAS, SSRS, Powerpivot, Powerview, Mobile web apps using Bootstap and jQueryMobile, Native apps using iOS xCode, and NoSQL based technologies like Elasticsearch and MongoDB. Working as a solution architect with a broad range of projects and technologies is like working as a chef in a kitchen. I get to mix and merge various technology combinations, to create various solution recipes that cater to project requirements. The only exception is bad recipes are not tolerated easily as significant cost is involved based on a architect's decision.</span></div>
<br />
<span style="font-family: Verdana, sans-serif; text-align: justify;">I have spent my career working with technologies that were predominantly from Microsoft space. But the world is changing, and so are the focus on technologies. I have been taking a lot of personal interest in studying more on the NoSQL based technologies that can tap intelligence from unstructured data as well as big data.</span><br />
<br />
<span style="font-family: Verdana, sans-serif; text-align: justify;">One of the biggest traits that many developer or architect generally have is the typical punch line "I can't learn by reading, I need hands-on experience of the technology I need to manage". If you are working with a multi-national organization, it's not that easy to land into a project where neither you would have an experience in the driving technology, and in most cases neither the organization would have any experience too. </span><span style="font-family: Verdana, sans-serif; text-align: justify;">When organizations don't find or recognize use-cases for any particular technology, if you try to push or propose the technology, it would be seen as you are trying to sell the technology and it's a solution in search of a problem. </span><br />
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">So the big question is, how to bag an entry ticket into the NoSQL world and drive a project using NoSQL technologies ?</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Some of the initiatives that can help professionals seeking to build competency in NoSQL as well as intending to drive NoSQL based projects, can consider the following points:</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) <b>Setup a personal lab: </b>Virtualization has made is easy to create a VM. Most of the NoSQL technologies require very modest resources (like 2 GB RAM and single core), to run the software. This can be a starting playground to start practicing the technology.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) </span><b style="font-family: Verdana, sans-serif;">Join the global community: </b><span style="font-family: Verdana, sans-serif;">Platforms like Github and Stackoverflow have lot of community projects and real-life questions. By being an active observer as well as participant of these platforms, one can mature on the technology very fast as well as make oneself globally visible as an active professional in the technology of choice.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) <b>Create a community within your organization: </b>Organizations feel comfortable in adopting technologies, which can be easily managed by the pool of people available in the organization. If you one of the few ones having grip on the technology, you may classify yourself in the niche bracket, but that does not increase organizations confidence to deal in the technology. To deal with this issue, you should conduct various awareness sessions to bring people are various levels up to speed with technology, and create a community of practice in the organization.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">4) <b>Pursue a professional training: </b>Post you have been able to successfully pursue points 2 and 3, you can confidently ask for a budget from the organization to pursue professional training on the subject. Everyone's pocket might not allow to pursue training from one's own pocket !!</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">5) <b>Develop and publish POCs: </b>Confidence to adopt a technology and confidence in a professionals ability to manage a technology, is reflected by the professionals ability to justify the use-case for technology. Identifying use-cases and justifying through POCs are the best means for the same.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">By following these 5 steps, I believe that one can establish oneself as well as one's organization in a position to make an entry in the NoSQL world. Let me know what you think.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-19343486914287142372013-02-07T11:10:00.000-08:002013-02-07T11:58:36.464-08:00Building Social Analytics with MS BI<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Every form of analysis needs data, but it's not possible that one might have that data generated and stored in organizational repository. Many forms of analysis depends upon data from third-party, and platforms like Windows Azure Marketplace are based on the same principle.</span></div>
<div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
</div>
<div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Social Analytics is widely used to forecast the impact on the business and extract insights to counter the same. The interesting question here is, what is the data source that can be used to calculate / derive sentiments of customers related to the respective business ? A majority of this data would come from social / professional / collaboration forums. Examples of such sources are Facebook, YouTube, Twitter, LinkedIn, PInterest, IMDb, Blogs etc. Anyone would agree that the analytics derived from unstructured data created by the public interaction on social media can be expected to be much more close to precision than even any data mining algorithm. But the big question here is, the amount of data - very very very big data. </span><span style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">On a daily basis, there are 400 million Tweets, 2.7 billion Facebook Likes, and 2 billion YouTube views. Even these figures might have been outdated today.</span></span></div>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Say an organization is influenced by Sharepoint 2013 enhancements related to social media collaboration, and intends to add an ability to derive sentiment analysis in their client offering. Let's say that as a starting source, Twitter is selected as the source of data, and all the public tweets for a particular product would be analyzed and the results would be stored for future use. </span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The first challenge is that according to a study, Twitter generates approximately 1 billion tweets in less than 3 days. So how to deal with processing such a huge amount of unstructured data and just consider the kind of infrastructure required to handle this processing. To proceed with the case study, let's say that we live in the age of cloud and we just signed up on AWS and have beefed up a fat Amazon EMR that uses Hadoop and HBase NoSQL database.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The second challenge in this case is how to get access to Twitter Firehose - an API that provides streaming access to Twitter public tweets. One needs to partner with Twitter and pay millions of dollars to get licensed access to it's sea of unfiltered dataset. Also you would need rights to publicly sell this dataset to your end-clients. Considering this complexity any organization would give up the idea of implementing it for own use.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Sometimes the answer to the problem is not technology but it's partner technology. Only three publicly known companies have licensed rights to Twitter's Firehose - <a href="http://topsy.com/" target="_blank">Topsy</a>, <a href="http://datasift.com/" target="_blank">Datasift</a>, and <a href="http://gnip.com/" target="_blank">Gnip</a>. These companies have established partnership with hundreds and thousands of social media platforms, established a web scale and google inspired flavor of infrastructure based on Hadoop clustering methodology, and also have been maintaining a huge archive of historical social data. On the top of it, these providers provide real time access to live stream of social media and also provides social analytics using intelligent methods. An interesting case study of how Datasift manages infrastructure for huge processing, storage and analytics can be read from <a href="http://www.cloudera.com/content/dam/cloudera/Resources/PDF/Cloudera_DataSift_Case_Study.pdf" target="_blank">here</a>.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>How MS BI is related to it ?</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b><br /></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Even if one selects to sign-up with any of these providers and source analyzed data from them, one would have to keep storing the results. These providers have pay-per-use pricing model depending upon the selected source. After intelligently extracting analyzed data from different sources through these providers, one would have to warehouse the same to avoid paying repeatedly for the same data. Considering the volume of data, even if analyzed data from these social media providers is warehoused, it would easily create a huge warehouse of data.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Microsoft have two different flavors of analysis models (Tabular mode SSAS and OLAP mode SSAS) under the BISM umbrella and a very strong set of end user collaboration platforms including Sharepoint and Excel. Analyzing the warehoused data from social analytics providers with MS BI and including the same in solution offerings can be a deal breaker than implementing complex data mining algorithms or such methods.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">I would really like to hear what Microsoft thinks about my idea around social analytics with ms bi. Anyone reading this post is interested in sharing their thoughts about this idea, I would be more than happy to receive the same.</span></div>
</div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-20803557265387748722013-02-02T23:53:00.000-08:002013-02-02T23:53:59.909-08:00Amazon Web Services (AWS) pocket reference for Business Intelligence Architects / Architecture Design<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Every large scale IT organization is organized in some form of verticals / Strategic Business Units (SBU), or in some other form. These may be grouped by geography / technology / industry groups etc. Almost inevitably every such organization has a cloud computing capability, and most of cloud based projects / architectures are designed and developed by this capability. This may work till you are working in the capacity of an architect for your own set of projects that just deal with your technology.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">I believe that when one intends to grow as an enterprise architect, one needs to collaborate with SMEs from cross environments / technologies / platforms, and for the same one needs to have a good understanding of a variety of each of it.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>Why <a href="http://aws.amazon.com/" target="_blank">Amazon Web Services</a> (AWS)</b> - AWS is probably the largest cloud player in providing IaaS. Azure and other such platforms have started providing IaaS recently, but their major strength is PaaS where they provide technology to build solutions and the infra is managed by them. If one intends to develop solutions that have a very broad mix variety of technologies, then one would have to opt a very strong IaaS cloud environment, than a PaaS environment.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Below are some of my quick notes on the world of Amazon Web Services, that one might want to keep in consideration while architecting BI solutions on AWS.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<b>1) AWS has two types of clouds : Public / <a href="http://aws.amazon.com/vpc/" target="_blank">Virtual private cloud</a> (VPC)</b></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In public cloud servers are under AWS control, which can be configured by user. In VPC, servers are hosted within AWS but part of corporate network. IPs are under the control of the corporate network and security between the corporate network and servers hosted on AWS is the obligation of the corporate.</div>
<div style="text-align: justify;">
<br /></div>
<div>
<div style="text-align: justify;">
<b>2) Amazon <a href="http://aws.amazon.com/s3/" target="_blank">Simple Storage Service</a> (S3) :</b></div>
<div style="text-align: justify;">
<br /></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;">Its an object store, where one can store any type of data in huge amounts, and the same can be accessed using the API provided by amazon for S3. </li>
<li style="text-align: justify;">It's a highly available service, as it stores copies of data in multiple locations. It can be used as a staging location for migrating data across availability zones when using Elastic Block Store Disk.</li>
<li style="text-align: justify;">When data is stored into S3, the datatype is stored in a metadata tag. When a client accesses the data, it can check this tag to ensure that the data is read accordingly.</li>
<li style="text-align: justify;">S3 can store an object with max 5 GB in size. S3 objects can be accessed via REST/SOAP/HTTP. Third party tools are available to handle storage management inside S3.</li>
</ul>
</div>
</div>
<div style="text-align: justify;">
<b>3) Amazon <a href="http://aws.amazon.com/ec2/" target="_blank">Elastic Compute Cloud</a> - EC2</b></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Provides scalable and flexible compute capacity </span><span style="font-family: Verdana, sans-serif;">EC2 instance provides interface to manage Amazon Machine Image (AMI, also known as bundle). Amazon, and other third party providers like RightScale, IBM and others provide ready images for use.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Any software installation would be lost from EC2 instance, once the instance is "terminated". </span><span style="font-family: Verdana, sans-serif;">Persistent images are also available which can persist software changes, once the instance is stopped (but not terminated). These images are based on EBS or S3 instance store.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">If you use a SQL Server 2008 R2 AMI, then the license cost of SQL Server is included in the cost of running the instance. One cannot use their own purchased licenses to offset the cost of SQL Server license in a </span><span style="font-family: Verdana, sans-serif;">AWS provided SQL Server AMI.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">One can allocate static IP address to an instance using AWS "Elastic IP", and after that once can RDP to the same using the same IP / DNS every time. Without an Elastic IP, the IP address for the instance would </span><span style="font-family: Verdana, sans-serif;">change every time the instance is started and stopped. Elastic IPs are chargeable.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<i style="font-family: Verdana, sans-serif;">Billing types for EC2 instance</i></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;"><i><b>Reserved Instance</b> - This instance type requires reserving the instance for a fixed term. It includes an up-front cost, along with usage charges. This instance is cheaper than Unreserved instance.</i></span></li>
</ul>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;"><i><b>Unreserved Instance</b> - This instance is billed on pay-per-use basis, but is comparatively expensive than Reserved Instance.</i></span></li>
</ul>
<ul style="text-align: left;">
<li style="text-align: justify;"><i><span style="font-family: Verdana, sans-serif;"><b>Spot Instance</b> - These are unique type of EC2 instances, which are basically amazon's way to handle spare capacity. You need to set a price and number of instances you need. When the average spot price falls be</span><span style="font-family: Verdana, sans-serif;">low the price set by you, the instances would be allocated to your account. But downside is that once the average spot price rise above the price set by you, those instance would stop.</span></i></li>
</ul>
</div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">In AWS, you are not billed for any data transfer between AWS components (for example data transfer between S3 and EC2). But for any data traffic that goes in and out of the instance using Internet, is billable. </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Various categories of EC2 instances available like Micro, Standard, Cluster Compute, High-Memory Cluster, Cluster GPU, High Memory, High CPU, High Storage, High I/O etc. Also each of them have small, </span><span style="font-family: Verdana, sans-serif;">medium, large scaling for each category. A comparison can be seen from here: <a href="http://www.ec2instances.info/">http://www.ec2instances.info</a> , <a href="http://aws.amazon.com/ec2/instance-types/">http://aws.amazon.com/ec2/instance-types/</a></span></li>
</ul>
<div style="text-align: justify;">
<b style="font-family: Verdana, sans-serif;">4) Amazon <a href="http://aws.amazon.com/ebs/" target="_blank">Elastic Block Storage</a> (EBS)</b></div>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Its the storage system / disk where EC2 instance would store and persist data. EBS is created, configured and managed out of EC2 instance and not within it. Even if an EC2 instance has been terminated, data </span><span style="font-family: Verdana, sans-serif;">stored on EBS would persist.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">EBS volumes can be 1 GB to 1 TB in size.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">EBS volume availability is restricted to the region and availability zone in which they are created. It's possible to make it available within a different zone by creating a snapshot of EBS and storing it into S3, and </span><span style="font-family: Verdana, sans-serif;">again creating a new EBS from the snapshot stored in S3. But EBS cannot be made available across regions by any means.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">One EC2 instance can have many EBS volumes, but one EBS volume cannot be shared by multiple EC2 instances.</span></li>
</ul>
</div>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>5) Amazon <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html" target="_blank">Security Groups</a></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">It provides a way to restrict access on EC2 instances, by configuring ports, ip and servers that can connect to an EC2 instance. It acts as a firewall for an EC2 instance.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">All the EC2 instance on which a security group is applied, does not become part of a common group / subnet.</span></li>
</ul>
</div>
<div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>6) Amazon <a href="http://aws.amazon.com/cloudwatch/" target="_blank">CloudWatch</a></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Cloudwatch are of two types in AWS - Basic CloudWatch and Detailed CloudWatch.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Basic CloudWatch is available with EC2 instance. It collects different performance metrics related to the EC2 instance.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Detailed CloudWatch enables a detailed monitoring of EC2 instances, with alerts and notifications.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>7) Amazon <a href="http://aws.amazon.com/elasticloadbalancing/" target="_blank">Elastic Load Balancing</a> (ELB)</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Elastic Load Balancing can be used for two major purposes - Load balancing and Fault tolerance.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">As a load balancer it can distribute incoming traffic to different servers in a load balanced fashion.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">As a fail over balancer, it can detect a failed / unresponsive / unhealthy EC2 instance and route traffic to other instances as required.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>8) Amazon <a href="http://aws.amazon.com/rds/" target="_blank">Relational Database Service</a> (RDS)</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Amazon RDS provides full featured database services using MySQL, Oracle as well as SQL Server database engine.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">RDS provides fault-tolerance / high availability by creating Multi-AZ Deployments. With this option, one instance of RDS is created in the availability zone selected by user, and second instance is created in an </span><span style="font-family: Verdana, sans-serif;">alternative availability zone. Both instances are kept upto date in parallel. The second instance is not visible / available, until the first instance becomes unavailable, and when it does, the second instance takes </span><span style="font-family: Verdana, sans-serif;">over immediately.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">RDS instance can be configured to create Read Replica which are copies of the RDS instance, that can be used for reporting purposes.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">RDS instances are backed up by default in AWS and this backup remains available for a limited time. Backups are totally configurable and can be persisted indefinitely too.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>9) Amazon <a href="http://aws.amazon.com/sns/" target="_blank">Simple Notification Service</a> (SNS)</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Amazon SNS is a publish and subscribe model using which systems or user can generate and/or receive alerts and/or notifications.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">There are three methods in which alerts / notifications are delivered: Email / Http based web service call / A message via Simple Queue Service (SQS).</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>10) Amazon <a href="http://aws.amazon.com/cloudfront/" target="_blank">CloudFront</a></b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Its the Content Delivery Network of AWS that distributes and caches content at the nearest servers based on user request patterns.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>11) Amazon <a href="http://aws.amazon.com/elasticmapreduce/" target="_blank">Elastic MapReduce</a> (EMR)</b></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Amazon EMR provides features to process large amounts of data using Hadoop based processing combined with other AWS products.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">EMR also provides option to run HBase (column oriented, distributed, NoSQL database) on Hadoop clusters which enables real-time data access to Hadoop in cloud.</span></li>
</ul>
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><b>12) Amazon <a href="http://aws.amazon.com/iam/" target="_blank">Identity and Access Management</a> (IAM) </b>and <b>Amazon <a href="http://aws.amazon.com/cloudformation/" target="_blank">CloudFormation</a> </b>provides means to control permissions to AWS resources as well as manage AWS resources as a system respectively. <b>Amazon </b></span><span style="font-family: Verdana, sans-serif;"><b><a href="http://aws.amazon.com/route53/" target="_blank">Route 53</a></b> is a highly available and scalabe Domain Name System (DNS) management service that can be used with AWS IAM to manage domains with faster performance.</span></div>
</div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com2tag:blogger.com,1999:blog-11909584.post-86807460714752249572013-01-30T06:59:00.001-08:002013-01-30T06:59:58.944-08:00Data layer requirements for applications with cloud hosting and NoSQL databases - Cloudant<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">MVC is one of the most famous and widely adopted application development design patterns. </span><span style="font-family: Verdana, sans-serif;">Typically data layer of applications starts off with features that requests actions related to data manipulations. </span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">If an application is public facing where users can sign-up on the app and consume services, there are bright changes that with the success of the application, user data would grow linearly. </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">On the top of it if the application is related to social media / networking / large scale e-commerce, then the app is a future candidate to mature in the big data processing requirements. This puts an obligation on the data layer to be more mature, if you want to keep your operational costs (incurred by dedicated DBAs, infrastructure procurement and maintenance, etc) under check. </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">As data grows big, and tends towards being unstructured from structured, database choices tends to be towards specialized NoSQL databases from generalized relational RDBMS.</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">To manage data load, data management methods like replication, sharding, master-slave configuration, partitioning etc are employed. As most of these techniques require flexibly scalable infrastructure, data is hosted mostly on cloud and this means that data access layer should be mature enough to work with cloud and almost manage the infrastructure administration using the cloud provider api. </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Different cloud providers have their own pros and cons. An application downtime is directly proportional to equivalent loss is business in today's world. So you might want your data layer to be flexible enough, such that it can access data from some of the leading cloud hosting environments.</span></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">These specifications are too ambitious for an application to develop just for it's own use. But at the same time to keep frequent revamping of data layer at bay or to avoid create duplicate data layers for different optimal data access and management methods, one needs a data layer that can manage the above mentioned challenges.</span></div><div style="text-align: justify;"><span style="font-family: Verdana, sans-serif;"><br /></span></div><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">One interesting company provider a data layer solution that can meet the above specifications with even more features - <a href="https://cloudant.com/the-data-layer/" target="_blank">Cloudant</a>. It provides a data layer to </span></div>
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Manage data in CouchDB NoSQL database, </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Hosted on choice of cloud providers like AWS, Azure, RackSpace etc</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">RESTful CouchDB compliant APIs</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">JSON based data exchange format</span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Lucene based text search and much more.</span></li>
</ul><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">NoSQL databases, Cloud providers of the scale of Azure, AWS etc, MPP systems like Hadoop etc are quite famous. But the data layer that manages data access along with data management on cloud can be a very interesting option, than starting your app from scratch with a fat sized cloud account. Checkout the technical whitepaper on Cloudant for a detailed understanding of this service and this <a href="http://ossonazure.interoperabilitybridges.com/articles/using-the-cloudant-data-layer-for-windows-azure" target="_blank">article</a>.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-75799330144182414272013-01-27T13:51:00.000-08:002013-01-27T13:51:28.516-08:00Database Sharding, SQL Azure Federation, SQL Server 2012 and Sequence<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Almost everyone who is a SQL Server professional directly or indirectly knows the newly introduced <a href="http://msdn.microsoft.com/en-us/library/ff878091.aspx" target="_blank">Sequence object in SQL Server 2012</a>. This post is not about explaining how to use it. Just google what most famous blogger have to say about Sequence object, and you would find that the only content in the post is how to use it. I was hoping that someone would see it from the angle that I see it, but I didn't find any. So I thought of sharing my viewpoint on Sequence object.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">One can manage generating incremental keys/ids using the IDENTITY property. Is Sequence just another means to flexibly manage automated incremental ID generation ? I would say it's one of the means, but I don't see it as the ultimate purpose of it.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Applications enabling answer to complex queries often </span><span style="font-family: Verdana, sans-serif;">needs data of a huge size. How easily would you get to read data, depends on strategically how-and-where you write data. Network bandwidth, computing capacity and memory availability are some of the driving factors in read-write operations. When apps face performance pressure, the first step that is typically followed is scaling up. After scale up reaches a limit, scale out is the next measure. Partitioning, Replication, Federation are different approaches to scale out databases. If data is distributed and stored in a way that the query is likely to fetch all the data from a single server / partition, this strategy might work till each database size remains in gigabyte sized limits. If each partition / database is multi terabyte sized, even if entire data to be returned to database is figured out, it would face a bottleneck due to network bandwidth available to a single server due to the amount of data to be returned.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><a href="http://www.unbreakablecloud.com/wordpress/2011/07/25/db-sharding-explained/" target="_blank">Database sharding</a> is partitioning data horizontally across many servers on commodity hardware. Some areas where sharding is used include:</span></div>
<br />
<ul style="text-align: left;">
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">NoSQL databases running on Hadoop and MapReduce based infrastructure with database sharding have been able to solve database scalability related challenges on some of the worlds biggest and most aggressively growing datasets. </span></li>
<li style="text-align: justify;"><a href="http://blogs.msdn.com/b/cbiyikoglu/archive/2011/07/11/shipping-fast-sharding-to-federations-in-from-pdc-2010-to-2011.aspx" style="font-family: Verdana, sans-serif;" target="_blank">SQL Azure Federation on Windows Azure</a><span style="font-family: Verdana, sans-serif;"> platform is one of the awaited features in SQL Azure, that would allow multi tenant databases with sharding. </span></li>
<li style="text-align: justify;"><span style="font-family: Verdana, sans-serif;">Cloud databases often have to be multi-tenant / federated, due to the limits on database sizes as a result of cloud based storage architecture.</span></li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">The interesting part is, database sharding on relational databases is not available out-of-box, at least in SQL Server. Consider a scenario where the need is to insert a couple of records in order but in different databases. One would need a key/id generation mechanism that is independent of tables, unlike IDENTITY property.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Sequence is independent of tables, which makes it much more scalable. From the feature set it seems that Sequence can help to generate a ordered set of keys for storing data across shards in a sorted format. This can be confirmed only with a PoC, but for now I am hopeful that when SQL Azure Federation would go RTM, Sequence would definitely make a place in the T-SQL features for the same.</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">If you feel that database sharding for relational OLTP databases is not a popular practice, then you might want to check out this company that just runs its shop on database sharding - <a href="http://www.dbshards.com/" target="_blank">DBShards.com</a>. An interesting whitepaper on database sharding can be read from <a href="http://www.dbshards.com/articles/database-sharding-whitepapers/" target="_blank">here</a>.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-62178117851929809712013-01-24T04:06:00.002-08:002013-01-24T04:08:14.388-08:0070-463 Exam Guide - Part 2<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>How does dimensional modeling of a data warehouse enable / empower analytics ?</strong></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) Data analysis often requires pivoting huge amounts of data. Attributes of a dimension (i.e. columns in a dimension table) ideally contains non-continuous data. Such data is often suitable to slice / pivot data and can be used as a scale axis to visualize categorized data on a graph. Even if the data in attributes is continuous (i.e. data contains lots of distinct values), OLAP technologies like SSAS provides was to categorize continuous data into discrete categories. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) For example, if we try to plot data on a graph use values in Age attribute for the population in a city, on X or Y axis, then there would be inevitably 100 bars on a graph which is almost impossible to analyze. A better approach would be to divide age attribute in few distinct age groups and use those groups on the graph. Each group would represent a bar, and upon drilling down the bar, actual data can be brought on the graph.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;"><strong>Why do dimension tables often include columns that cannot be used for pivoting / filtering / analytics ?</strong></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) Often dimensions like Geography, Employee, Account, Products etc have attributes describing the dimension member. For example Employee dimension can have attributes like Gender, Date of Birth, Marital Status, Number of children etc.. Such attributes are neither good candidates for pivoting nor for any kind of roll-ups or aggregation.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) However broad or narrow be the analytical spectrum, inevitably almost any such system would have reporting on the top of it. If the dimensions have only pivotable or aggregatable attributes, then such data would be sufficient only to plot on a graph but not on a detailed report. Such data can facilitate drill-down, but not drill-through of the problem. For drill-through, one would require different attributes based on which a report can be sketched out. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">3) For an Employee report, one might start to analyze a report based on salary attribute as it is pivotable. For example, list all employees whose gross income is less than 30k USD. But post that one would want to check which of those are women having more than two kids and are aged over forty, to balance work load on them. For such a report, if dimensions do not contain such attributes, then a cross-database query joining data warehouse and transactional OLTP database would be required. This results in poor performance and other reporting issues. Hence such attributes are included in a dimension.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Such attributes in a dimension are called member properties.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;"><strong>What are hierarchies and how they help in analytics ?</strong></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) Attributes within a dimension as well as across dimensions can be related and well as unrelated. For example, date of birth and gender are unrelated attributes, but country and city are related attributes. Related attributes are often pivotable and form a hierarchy called natural hierarchies. Unrelated attributes have lesser probability of forming a hierarchy, and if they do, then it's called unnatural hierarchies.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) Hierarchies provide a drill-down path for data analysis are the data can be pivoted. Aggregatable attributes can only be rolled-up but cannot be drilled down to another level based on another related / unrelated attribute. For example salary can be rolled up from daily to the highest possible granularity. But to drill down salary from Year-Month-Week, a hierarchy made up of Year, Month and Date attributes is required which would slice rolled-up values of Salary attribute. Hierarchies formed of two or more related / unrelated attributes can be drilled-down which are very useful for analytics.</span></div>
<div style="text-align: justify;">
<br />
<span style="font-family: Verdana, sans-serif;"><strong>How to accommodate changing attribute values of a dimension in a dimensional model ?</strong></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Data archival policy is often overlooked or considered a trivial aspect in database modeling. But in data warehouse it has a very deep impact on the modeling itself. Facts should ideally contain transactional </span><span style="font-family: Verdana, sans-serif;">data, so there is no question of data update. Dimension attributes often have changing values, which might be worth keeping a track. For example, an organization might not be interested in keep a track of </span><span style="font-family: Verdana, sans-serif;">every time when an employee had children but only total number of children as of the latest status. But say, if the employee is a sales person, an organization would want to keep track of the field area where </span><span style="font-family: Verdana, sans-serif;">the sales executive was posted at any point in time. Without tracking this history, reports would always show that sales executive has operated in one area which might be his/her present area of operation, as </span><span style="font-family: Verdana, sans-serif;">the attribute value is always updated / overwritten. Such changing values of attributes in a dimension are addressed by a design aspect known as Slowly Changing Dimension (SCD).</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) SCD are of three types, based on how attribute history has to be preserved. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">SCD Type 1: Attributes for which values are updated / overwritten, and history is not maintained is know as SCD Type 1.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">SCD Type 2: To maintain history, at a minimum three new columns are added to the dimension - ValidFrom, ValidTill, and IsCurrent. ValidFrom and ValidTill defines the time scope during which the value was </span><span style="font-family: Verdana, sans-serif;">valid, and IsCurrent helps queries to easily identify the latest value of the attribute(s). When a value changes for an attribute, the ValidTill value is updated with the current date, and IsCurrent is marked as "N". A </span><span style="font-family: Verdana, sans-serif;">new record with the latest value of the attribute is added to the dimension, having ValueFrom as the current date, ValidTill as NULL and IsCurrent as "Y".</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">SCD Type 3: If one decided to maintain history of attribute values, and if the attribute values are frequently changing, then the dimension can have rapidly changing values (often known as rapidly changing </span><span style="font-family: Verdana, sans-serif;">dimension). Due to the same, dimension would have huge amount of dimension members effectively translating into performance issues in dimension processing as well as querying. Type 3 is a method where </span><span style="font-family: Verdana, sans-serif;">one can limit the amount of history to be preserved. Only the latest value and the one before that are preserved. Due to this design, the tracking and scoping fields required in Type 2 would not be required in </span><span style="font-family: Verdana, sans-serif;">Type 3.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>How are dimensions (especially slowly changing dimensions) associated with facts in a dimensional model ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Any dimension would have a business key, which is one of the candidates for primary key, for example ProductID in Product dimension. If a dimension does not have any SCD attributes, then Business key can be used as the primary key for the dimension. But in other case, whenever a new record is inserted for a SCD attribute, the business key would get duplicated. So to manage this, SCD dimensions have another field that acts as a placeholder of primary key - popularly known as Surrogate Key. This field would typically have auto-increment identity values in most of the cases.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Primary keys in a dimension act as a foreign keys in fact tables, which enables pivoting data based on dimension attributes for analytics. Business keys in a dimension helps in maintaining lineage with OLTP data sources, from where data is collected and warehoused typically using ETL methods. One can't correctly slice data in a fact with a dimension whose key is not present directly / indrectly in the fact table. A combination of all the foreign keys in a fact table can be used as a composite primary key in many cases.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) If a fact table is not associated with a dimension table, and still one wants to associate a dimension with a fact without adding primary key from dimension into the fact table, one can create bridge fact tables. These bridge tables contain only keys from dimension and fact, which creates an indirect association between dimension and fact tables.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What are measures and where are they stored ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Attributes of a fact table are known as measures. Fact table should contain calculable data, which is measure of different attributes of an entity. For example, for a Sales fact table would have order value, number of units in order, tax on order value etc, profit margin percentage etc.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><br />
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Additivity of different measures can be different - some can be fully additive like order value, some can be partially additive like tax, and some can be non-additive like profit margin percentage. Based on the nature of addivity, different aggregation functions can be applied on the attributes for roll-ups.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com1tag:blogger.com,1999:blog-11909584.post-48208720608667135762013-01-09T10:25:00.000-08:002013-01-09T10:25:42.606-08:0070-463 Certification Exam Guide / Notes - Part I<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif; font-size: small;">This blog has been silent almost across the previous year with very little activity. I got married, got 7 projects at a time on my back, worked as a Senior Architect, reviewed a book and lots more. I struggled striking a balance between personal life and professional life, and now I am trying to come back to business.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-size: small;"><span style="font-family: Verdana, sans-serif;">As a part of my role, I am pursuing and mentoring a group to pass the exam </span><a href="http://www.microsoft.com/learning/en/us/exam.aspx?id=70-463" target="_blank"><span style="font-family: Verdana, sans-serif;">70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012</span></a><span style="font-family: Verdana, sans-serif;">. Through the series of posts I would share my notes for this exam preparation. This is Part I of the notes.</span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>Why do we need a Data Warehouse ?</strong></span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Data stored in normalized schema in OLTP systems can have hundreds to thousands of tables for an enterprise. OLTP systems have a probability, that a portion of these tables can be often less descriptive </span><span style="font-family: Verdana, sans-serif;">due to lack of self relevant naming conventions. This makes designing queries harder for reporting purposes.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) As normalised schema has a lot of tables, a single piece of related information is split and stored into various tables with referential integrity constraints. This means that reading this data requires creating </span><span style="font-family: Verdana, sans-serif;">joins with many tables. For reporting and analysis purposes over a very huge dataset spanning thousands to millions of records of historical data, such queries would perform very poorly.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) OLTP systems tend to archive data time to time on a scheduled basis. It might be in the form of hard / soft delete. Lack of historical data can limit the level of analysis that can be performed on the data.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">4) OLTP systems tend to store most updated version of data only. For ex employee address, product name, martial status etc. Generally history for such attributes would not be preserved, which results in loss </span><span style="font-family: Verdana, sans-serif;">of historical data. Lack of history for such attributes can limit the level of analysis that can be performed on the data.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">5) OLTP data in an enterprise can be federated across different departments / business units / associate group companies. Each of it would typically have their own set of applications. There would be inevitably </span><span style="font-family: Verdana, sans-serif;">as set of common master data duplicated across the units. Due to duplicate and non-synchronized master data, consolidated reporting and analysis that combines data across the enterprise becomes almost </span><span style="font-family: Verdana, sans-serif;">impossible.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">6) As each working unit of an enterprise can have their own OLTP system, different attributes can have data representation in different forms for the same data. For ex, an employee is permanent can be </span><span style="font-family: Verdana, sans-serif;">represented by Yes / No, Y / N, 1 / 0 etc. Even this makes data extremely difficult to interpret and hence raising the challenges to a centralized reporting over the enterprise data.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">7) OLTP systems can contain data in free form, which can lead to degraded data quality. For example, free form address entry can lead to a very poor quality of geographic information due to typographic </span><span style="font-family: Verdana, sans-serif;">mistakes, effectively leading to unusable data and inviting data cleansing exercise.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">8) OLTP systems tend to have a very good classification of business entities, but the some entities required for analysis are not normalized to the most detailed extent. For example, attributes related to date </span><span style="font-family: Verdana, sans-serif;">would would stored in datetime format. Entities like date and time has constituents like day / week / month / quater / year / fiscal year / sec / minute / hour etc. Mostly reporting and analytics works on a time </span><span style="font-family: Verdana, sans-serif;">scale. To report data based on any particular constituent of date / time, it would require extracting this constituent from the date value on a very huge dataset. This could lead to great performance issues.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What is a Star schema and when it should be used ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Data model / schema used to create a data warehouse is also known as dimensional model. In this model, reference / lookup / master tables are called dimensions. Measureable details of data are called </span><span style="font-family: Verdana, sans-serif;">measures / facts and the tables that host the same are called Fact Tables.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Typcially, a simplified denormalized schema that covers the most granular section of a business entity can be represented by a star schema. Such a schema would have a single fact table containing </span><span style="font-family: Verdana, sans-serif;">measurable / aggregatable values. This fact table would have foreign keys from all dimension tables that completes identity of that record and provides different angles of analysis. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">If you look at </span><span style="font-family: Verdana, sans-serif;">AdventureworksDW 2012 schema, FactInternetSales / FactResellerSales is a Fact Table. This table has reference from different dimensions like Date, Time, Product, Customers etc. Together these Fact and </span><span style="font-family: Verdana, sans-serif;">Dimensions form a schema known as Star Schema.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What is a Snowflake schema and when it should be used </strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) A typical dimension is a highly denormalized business entity. If the same is normalized to 2nd or 3rd normal form, then the same star schema is termed as snowflake schema. For example, Product-</span><span style="font-family: Verdana, sans-serif;">ProductCategory-ProductSubCategory, Country-State-City etc are examples of snowflaked Product and Geography dimension respectively.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) From a warehousing perspective, any data analysis done on data sourced from warehouse directly would suffer performance due to more joins as a result of more tables in a snowflake schema. From the </span><span style="font-family: Verdana, sans-serif;">perspective of creating data marts, snowflake schema would provide options to source data more selectively to datamarts.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">3) Any typical enterprise class dimensional model would inevitably contain a snowflaked model of schema. Even if one selects not to normalize business entity into normalized dimensions, still dimensions would </span><span style="font-family: Verdana, sans-serif;">contain some common attributes like date keys, time keys, status keys etc. So Date dimension would be related Patent dimension by the date key for patenteddate field.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What is a Conformed Dimension ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Theoretically considering if each discrete business area has a star schema, then different business areas typically tend to be connected to be each other. So if a star schema is created for each business unit in </span><span style="font-family: Verdana, sans-serif;">a enterprise, all these star schemas would have to be connected too.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Two schemas can be connected when they have something in common. If a dimension is shared across more than one schemas, then such dimensions are known as conformed dimensions. For ex, Date </span><span style="font-family: Verdana, sans-serif;">dimension would be shared by almost all the Fact tables in the schema.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What is a Data mart ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) Using a data warehouse as the source, technologies like SQL Server Analysis Services creates multidimensional data structures known as Cube. A cube contains dimensions and measures that can be used </span><span style="font-family: Verdana, sans-serif;">for large scale real-time analysis. These systems are known as Online Analytical Processing (OLAP) applications.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) Typically a data warehouse can contain data from the entire enterprise along with history. A data mart would source data from a data warehouse and build specialized data structures on the top of the same </span><span style="font-family: Verdana, sans-serif;">for reporting and analytics.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;"><strong>What is Dimensionality / Granularity ?</strong></span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">1) If two business entities contain data at the same scale, then they are said to be at the same level of granularity. For example, if sales data is stores weekly in facts and orders data is stored daily, then order data can be said to have a lower granularity compared to sales.</span></div>
<span style="font-family: Verdana, sans-serif;"><div style="text-align: justify;">
<br /></div>
</span><div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">2) To associate two entities in a dimensional model, they need to be calculated as the same level of granularity.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Feel free to share you queries on this posts, and I would try to answer the same in best possible way.<span style="font-size: small;"></span></span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com2tag:blogger.com,1999:blog-11909584.post-15407003098581559692012-10-14T04:18:00.000-07:002012-10-14T04:36:53.720-07:00Using SSRS with Silverlight, HTML5, JavaScript, Flash, Google Charts and other third party products.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">SSRS, Powerpivot, Powerview, Excel Services, Powerpoint Services, Sharepoint Dashboarding all offer variety of data visualizations. But most of them end up with a static graphic image on a web browser, which is neither interactive nor appealing enough to the end users. Many architects take the route of custom development or try seeking reusable assets (third party / in-house) that can be used to bring a wow-factor on the user interface. These report platforms are not packaged with those rich interactive graphical capabilities that can blend on a portal or by itself serve the high-end UI needs. </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">For example, the charts and graphs that are generated by SSRS report output, are in the form of static images. There is no way that user can click on a chart/graph and it would drill down to the next level of hierarchy. Also if I have two charts/graphs and I want to configure them in a way such that I click on one of them and the other shows context sensitive information related to the same. Often reporting systems require such graphical interactivity, but the present stack of reporting tools are not yet capable of presenting such information with user interactivity.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">There are different options that can be taken to provide reporting using a rich UI, which are mentioned as below:</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) Develop UI using Silverlight applications (.xap)</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) Develop UI using Silverlight + .Net or plain .Net</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">3) Develop UI using HTML5 + JavaScript + Adobe Flash based graphics</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">4) Develop UI using online charting services like Google Charts</span></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">5) Develop UI using third party tools.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">A sample architecture diagram is shown below where application layers makes calls to SSRS web services. Above mentioned options would fit in one or other layer and have its own advantages and limitations.</span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhr5TbPPq8T9iR6s58ElBKICNIIwcXdaIHB7tAnoRpwd_DM9Ge6gA-yKRIgDRClb61CGUkYzs18W60b1NPJrXeUbDJUUSzvblSuRKXPINYk8lVbT9V3jfKePEZAojKvtKh-bCpKA/s1600/Arch.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="107" nea="true" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhr5TbPPq8T9iR6s58ElBKICNIIwcXdaIHB7tAnoRpwd_DM9Ge6gA-yKRIgDRClb61CGUkYzs18W60b1NPJrXeUbDJUUSzvblSuRKXPINYk8lVbT9V3jfKePEZAojKvtKh-bCpKA/s320/Arch.png" width="320" /></a></div>
<br />
<span style="font-family: Verdana, sans-serif;">Below listed are different chart and graph options to generate interactive charts and graphs, and blend with your UI along with SSRS reports.</span><br />
<br />
<span style="font-family: Verdana;">1) <a href="http://www.highcharts.com/" target="_blank">Highcharts</a></span><br />
<br />
<span style="font-family: Verdana;">2) <a href="http://www.jscharts.com/" target="_blank">JS Charts</a></span><br />
<br />
<span style="font-family: Verdana;">3) <a href="http://g.raphaeljs.com/" target="_blank">gRaphael JavaScript Library</a></span><br />
<br />
<span style="font-family: Verdana;">4) <a href="http://www.amcharts.com/" target="_blank">amCharts : JavaScript / HTML5 Charts</a></span><br />
<br />
<span style="font-family: Verdana;">5) <a href="http://www.rgraph.net/examples/index.html" target="_blank">RGraph JavaScript Chart and Graphs</a></span><br />
<br />
<span style="font-family: Verdana;">6) <a href="http://www.fusioncharts.com/products/suite/" target="_blank">FusionCharts : JavaScript, HTML5 and Flash based data visualizations</a></span><br />
<br />
<span style="font-family: Verdana;">7) <a href="http://www.jpowered.com/graph_chart_collection/index.htm" target="_blank">JPowered JavaScript graphing library</a></span><br />
<br />
<span style="font-family: Verdana;">8) <a href="http://almende.github.com/chap-links-library/index.html" target="_blank">CHAP Links Library using Google Charts Visualization</a></span><br />
<br />
<span style="font-family: Verdana;">9) <a href="http://www.steema.com/teechart/html5" target="_blank">TeeChart JavaScript and HTML5 charting library</a></span><br />
<br />
<span style="font-family: Verdana;">10) <a href="http://www.omnipotent.net/jquery.sparkline/#s-about" target="_blank">jQuery Sparklines plugin</a></span><br />
<br />
<span style="font-family: Verdana;">11) <a href="http://www.jqplot.com/" target="_blank">jqPlot : jQuery plotting plugin</a></span><br />
<br />
<span style="font-family: Verdana;">12) <a href="http://silverlight.codeplex.com/" target="_blank">Microsoft Silverlight Toolkit</a></span><br />
<br />
<span style="font-family: Verdana;">13) <a href="http://www.infragistics.com/products/ultimate" target="_blank">Infragistics NetAdvantage Ultimate</a></span><br />
<br />
<span style="font-family: Verdana;">14) <a href="http://dojotoolkit.org/documentation/tutorials/1.7/charting/" target="_blank">Dojo Charting</a></span><br />
<br />
<span style="font-family: Verdana;">15) <a href="http://canvasxpress.org/" target="_blank">CanvasXpress - JavaScript library based on HTML5 Canvas Tag</a></span><br />
<br />
<span style="font-family: Verdana;">16) <a href="http://www.humblesoftware.com/flotr2/index" target="_blank">Flotr2- JavaScript library based on HTML5 Canvas Tag</a></span></div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com2tag:blogger.com,1999:blog-11909584.post-66072862844758617132012-10-06T10:55:00.000-07:002012-10-06T11:18:45.983-07:00Using Microsoft Office Project Server with MS BI ( SSIS, SSAS, and SSRS )<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: justify;" trbidi="on">
<span style="font-family: Verdana, sans-serif;">Microsoft Office Project Server (MSPS) is one of the healthiest source of data in the microsoft ecosystem. Many departments especially CIOs have the greatest potential and probability to make extensive use of the data contained in Project Server. Almost every organizations have different projects for which they carry out planning, tracking, monitoring, resource assignments and related activities. MS Project Server is a chef's knife for this purpose.</span></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<br /></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<span style="font-family: Verdana;">From a technical standpoint, the way MSPS stores data is very interesting. Like Sharepoint, it stores data internally into SQL Server. But unlike Sharepoint, it gives a very neat and clean mechanism to use to data it stores internally in the form of a database intended for reporting known as <a href="http://download.microsoft.com/download/7/7/6/7769F459-8D80-4338-A4E0-E06ABC83C1FE/Microsoft%20Project%20Server%202010%20Reporting%20with%20Excel%20Services.pdf" target="_blank">Reporting database</a> and is operated using a service known as Report Data Service. Also it has a service called <a href="http://msdn.microsoft.com/en-us/library/office/aa974558.aspx" target="_blank">Cube Build service</a> (CBS), which can be operated using a web based console known as Project Web App (PWA). </span></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<br />
<span style="font-family: Verdana, sans-serif;">The Reporting database (RDB) is the staging area for generating reports and OLAP cubes. Data in the Reporting database is comprehensive and is updated nearly in real time. The tables and views are optimized for read-only report generation; for example, the RDB tables are denormalized to provide redundant data and reduce the number of relational tables. As data is updated in real time in RDB, in case if you are considering extracting data from it to some other data store, consider reading <a href="http://msdn.microsoft.com/en-us/library/office/aa568342(v=office.12).aspx" target="_blank">how data gets to the RDB and Report Data Service</a>. </span><span style="font-family: Verdana;">Schema documentation of the reporting database as well as the OLAP cubes is available and can be downloaded from <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=46007f25-b44e-4aa6-80ff-9c0e75835ad9&displaylang=en" target="_blank">Project 2010 Reference: Software Development Kit</a>, in the <em>documentation\schemas</em> subdirectory.</span></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<br /></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<span style="font-family: Verdana;">Microsot Office Project Server 2010 Architecture Diagram can be seen below:</span></div>
<div dir="ltr" style="text-align: justify;" trbidi="on">
<br /></div>
<div align="center">
<img src="http://i.msdn.microsoft.com/dynimg/IC454516.gif" /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
</div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">As apparent in the above diagram, MS Project Server is very well integrated with Sharepoint 2010. Hence using reporting related tools like Excel Services, Performancepoint Services and BI + Dashboarding capabilities in-built into Sharepoint, a rich reporting platform can be provided to end users from data contained into Project Server 2010.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">From an MS BI perspective, </span></div>
<ul style="text-align: left;">
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">SSIS can be used to extract data from reporting database and merge this data into a corporate warehouse</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">SSAS can be used to source and enhance cubes and OLAP database exposed by project server</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">SSRS can be used to generate reports on the top of OLTP reporting database and cubes contained in OLAP database exposed by Project Server.</span></div>
</li>
</ul>
<div style="text-align: justify;">
<span style="font-family: Verdana;">I seriously wish that perhaps Sharepoint can expose such databases for reporting and analysis, as that makes it very easy to facilitate reporting and analysis of the content stored in sharepoint.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">To understand more about Project Server, you should consider reading about <a href="http://msdn.microsoft.com/en-us/library/office/ee767687.aspx" target="_blank">Project Server Architecture</a> and <a href="http://msdn.microsoft.com/en-us/library/office/ms504195.aspx" target="_blank">Project Server Programmability</a>. Also consider reading more about <a href="http://technet.microsoft.com/en-us/library/ee662106.aspx" target="_blank">how to configure reporting for Project Server 2010</a>.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-62961787081603307532012-10-02T10:28:00.000-07:002012-10-02T10:31:12.271-07:00HTML5 Browser Compatibility for BI Solutions<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Consumers of a BI solution are increasing and BI solution are increasingly becoming web-based. Technologies like Silverlight are not supported on platforms like iOS and browsers like Safari Mobile. HTML5 due to its capability to render rich media on mobile devices, is receiving more and more adoption day by day. Below mentioned are some nice references that can be handy for various purposes when you are playing with html5. Some of these purposes can be cross browser compatibility, local storage on devices, feature support across devices, testing your application compatibility for html5 and others.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) <a href="http://www.w3schools.com/html/html5_intro.asp" target="_blank">Difference between HTML4 and HTML5</a></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) <a href="http://diveintohtml5.info/" target="_blank">Download Free HTML5 Ebook</a></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">3) <a href="http://html5test.com/results/mobile.html" target="_blank">How to test cross browser compatibility</a></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">4) <a href="http://www.microsoft.com/learning/en/us/exam.aspx?id=70-480" target="_blank">HTML5 certification exam 70-480: Programming in HTML5 with JavaScript and CSS3</a></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">5) <a href="http://speckyboy.com/2012/03/25/getting-to-grips-with-html5-browser-compatibility/" target="_blank">HTML5 resources</a></span><br />
<br />
<span style="font-family: Verdana;">6) <a href="http://mobile.dzone.com/articles/new-iphone-5-and-ios-6" target="_blank">New iPhone 5 and iOS 6 features for HTML 5 Developers</a></span></div>
<br />
<div style="text-align: center;">
<img src="http://speckycdn.sdm.netdna-cdn.com/wp-content/uploads/2012/03/chart1.png" /></div>
<br />
<div style="text-align: center;">
<img src="http://speckycdn.sdm.netdna-cdn.com/wp-content/uploads/2012/03/chart2.png" /></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-23569960880110834562012-09-27T11:43:00.000-07:002012-09-27T12:13:39.840-07:00SSRS Report prototypes using Google Charts<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Contemplating and conceptualizing design of any application starts with almost no tools at hand. MS Office - Powerpoint and Excel, or similar products are the tools that remain available with the design and development teams for developing application screen design prototypes. MS Visio is generally used to develop wireframes. In an application / report centric design, users preference remains viewing reports blended with the web application itself. As the need for a mature prototype develops, html - css - javascript based prototype starts getting developed.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Report prototypes are generally developed as excel spreadsheets or raw reports exported to excel, which are evidently totally isolated and hard to visualize how they would gel with the hosting environment. <a href="https://developers.google.com/chart/interactive/docs/index" target="_blank">Google charts</a> provides excellent <a href="https://developers.google.com/chart/interactive/docs/gallery" target="_blank">variety of visualizations</a>. The best part about this visualizations are:</span></div>
<ul style="text-align: left;">
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">Visualizations are interactive, but they don't use Silverlight. Interactive term is used very loosely but its a very valuable term. SSRS reports are rendered as an image file which is not interactive. But these visualizations have one or other form of user interaction feature available along with tooltips.</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">These visualizations are generated using HTML5 so they are cross-browser as well as mobile device compatible. Also visualizations are drawn using SVG or VML.</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">Embedding these visualization and populating it with data is mere couple of html and javascript tags. Visualizations are exposed as google javascript libraries. Include those libraries in your page, create objects from the exposed object model and add data in the form of a very simple array. And your report visual is ready.</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">Visualization such as treemap / intensity map, motion charts etc are also available which missing in SSRS.</span></div>
</li>
<li><div style="text-align: justify;">
<span style="font-family: Verdana;">These visualizations are exposed in the form of classes, and they also have event listeners. This means you can bind user interaction on this visualizations with your server side code too.</span></div>
</li>
</ul>
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhj0Br5X28pooESJXzJmY1T2AN7E8myh5sqYklmA4nL2OFJ0vHM_ZwQsYDoHCbUYmkU4pwXgE5OS0XNPBraYzdUQQegE95HYpQ7Xp-ZoUdLuhmozwEI2doYjVnoIJOKTNF2PwHbkQ/s1600/Charts.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="255" kea="true" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhj0Br5X28pooESJXzJmY1T2AN7E8myh5sqYklmA4nL2OFJ0vHM_ZwQsYDoHCbUYmkU4pwXgE5OS0XNPBraYzdUQQegE95HYpQ7Xp-ZoUdLuhmozwEI2doYjVnoIJOKTNF2PwHbkQ/s320/Charts.png" width="320" /></a></div>
<br />
<div style="text-align: justify;">
<span style="font-family: Verdana;">One might also think why not use these visualizations in production environments ? The main reason I would have resistance against using these in production, is that firstly these are provisioned for free on google infrastructure. So you can't commit any SLA with confidence to end-clients regarding performance. Next google publishes a <a href="https://developers.google.com/chart/terms" target="_blank">deprecation policy</a> that supports backward compatibility for 3 years. This means that once any visualization is classified as deprecated, after 3 years that would disappear from google charts. This is not acceptable in any serious production environments. If I compare this with Microsoft policies, any deprecated or even discontinued products like Proclarity / PPS 2007 have a support policy for 10 years. But of course these products are not free.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">So I suggest that its one of the best tools to use to create great report prototypes that are almost closest to the actual reports. Below is a sample code to create a pie-chart.</span></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6UcZEcalu5rZppHy8Crc8gdcQEU_Rq8cdS9RhvV_WYomeDWFryc3FzKhSfAozosZpcCqjmqLD2m_mLIvZN0I4eiYrsWiGUYp0HnT3fU_eXXQCHpxVeurhbY3LyQfHQHBgyyqMdg/s1600/ChartIntegration.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="151" kea="true" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6UcZEcalu5rZppHy8Crc8gdcQEU_Rq8cdS9RhvV_WYomeDWFryc3FzKhSfAozosZpcCqjmqLD2m_mLIvZN0I4eiYrsWiGUYp0HnT3fU_eXXQCHpxVeurhbY3LyQfHQHBgyyqMdg/s320/ChartIntegration.png" width="320" /></a></div>
<br /></div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0tag:blogger.com,1999:blog-11909584.post-80141317691754864752012-09-25T11:13:00.000-07:002012-09-25T11:13:55.600-07:00SSRS reports on iPhone, iPad, Android, Windows Mobile for Mobile UI<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Viewing applications on mobile devices might sound like a small problem statement, but the actual solution requirement is much broader than just resolution adjustment. Some of the major challenges involved in the solution architecture are:</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) Single codebase for the model and controller layer, and using the same for creating different view layers for different devices and platforms.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) Whether to create a webapp optimized for devices or whether to create a native application that calls webservices / displays web content.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">3) HTML5 is supported in different capacities by different browsers, and most of the microsoft frameworks do not emit HTML5 by default. In fact features like local storage is not supported below IE8+. Also its quite heavier to use as the data exchange format for devices, compared to JSON.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">4) Whichever framework is used, cross-browser compatibility is always an implicit / explicit business mandate.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">5) Using same navigation design, for different sized devices like iPhone, iPad, Tablets and Desktops, would not be admired by users from a usability and user experience perspective.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">6) REST based services are more faster and lighter to use compared to WCF based webservices. But WCF has got a very wide support, features and integration with .NET.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Below are some pointers that can be kept in mind while designing the solution / technology architecture:</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">1) HTML5 is supported by most modern browsers used on different devices. Creating a web application with HTML5, CSS3 and JavaScript is the most advisable step if you are completely inexperienced in mobile application development.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">2) Web-based frameworks like <a href="http://jquerymobile.com/" target="_blank">jQueryMobile</a><span style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: white; color: black; display: inline !important; float: none; font: 14px/20px Helvetica, arial, verdana; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span style="font-family: Verdana, sans-serif;">, </span></span><span style="font-family: Verdana, sans-serif;"><a href="http://www.sencha.com/" target="_blank">Sencha</a></span><span style="-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; background-color: white; color: black; display: inline !important; float: none; font: 14px/20px Helvetica, arial, verdana; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><span style="font-family: Verdana, sans-serif;">, and </span></span><span style="font-family: Verdana, sans-serif;"><a href="http://dojotoolkit.org/" target="_blank">Dojo</a> can leverage the existing web based SDKs / codebase to build more sharper mobile applications in faster, easier and efficient manner. These frameworks have built-in libraries to use REST and JSON too for client-server communication. Even if web application is converted native application in the future, http/rest/json based communication protocol is supported by platforms like android, iOS and others.</span></span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">3) To take your web application development framework to the next level, use frameworks like <a href="http://www.appmobi.com/" target="_blank">appMobi</a>, <a href="http://www.appcelerator.com/" target="_blank">Appcelerator</a>, <a href="http://phonegap.com/" target="_blank">PhoneGap</a>, <a href="http://www.applicationcraft.com/" target="_blank">ApplicationCraft</a> and others to build native applications using JavaScript.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">4) More about how to build iOS application from scratch can be read from <a href="http://designthencode.com/scratch/" target="_blank">here</a>.</span></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://blogs.vmware.com/vfabric/files/2012/09/javascript-frameworks.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="199" kea="true" src="http://blogs.vmware.com/vfabric/files/2012/09/javascript-frameworks.jpg" width="320" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">I have implemented architectures, where we create .NET user controls that makes programmatic calls to SSRS webservice for reports execution. The HTML output returned by SSRS report is collected and rendered in the control. These controls are hosted in .NET pages, which are hosted on Sharepoint. Finally when server leaves control, entire web content gets transmitted as HTML. By introducing HTML5 conversion wrappers at different layers depending upon the design of the solution, not only SSRS reports but any web application can be optimized for mobile devices.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">There is lot more to consider like performance, navigation design, user experience, local data storage and others such points. Feel free to share your thoughts and experiences by commenting on this post.</span></div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com3tag:blogger.com,1999:blog-11909584.post-18619176121120051502012-09-23T10:43:00.001-07:002012-09-23T11:34:05.049-07:00Hadoop tools for SSIS, SSRS and SSAS like Integration, Reporting and Analytics<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">Data hosting, processing and reporting is dramatically changing on a variety of extremely different platforms than ever. With the emergence of NoSQL and Big Data, systems such as Hadoop host unimaginable volumes of data. Google is soon to hit 1 Billion Android device activations. US and China collectively contributes to almost 300 million iOS + Android activations. Sourcing data from systems like Hadoop, mashing it up with relational data sources and provisioning reporting and analytics on the most aggressively growing platforms like Android and iOS is not an easy job, leave apart the complexity, cost and skills involved in the process.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Recently I have seen quite a couple of SQL Server and MS BI related blogs writing about how to write code for HBase, Pig and for other similar sources. Industry matures in terms of developer productivity and user friendliness much aggressively than one knows. <a href="http://www.talend.com/products-big-data/open-studio-bd.php" target="_blank">Talend</a> - an open source provider of tools for managing Big data, provides a tool called Talend Open Studio for Big Data. Its a GUI based data integration tool like SSIS. Behind the scenes this tool generates code for </span><span style="font-family: Verdana, sans-serif;">Hadoop Distributed File System (HDFS), Pig, Hbase, Sqoop and Hive. This kind of tools really take Hadoop and Big Data to a extremely wide user-base.</span></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.talend.com/products-big-data/img/talend-open-studio-for-big-data.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" hea="true" height="235" src="http://www.talend.com/products-big-data/img/talend-open-studio-for-big-data.png" width="320" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana, sans-serif;">After you have the ways to build a high-way to a mountain of data-source, the immediate need is to make meaning of these data. One of the front-runners of data visualization and analytics, <a href="http://www.tableausoftware.com/solutions/hadoop" target="_blank">Tableau</a>, provides way to create ad-hoc visualizations from extracts of data from Hadoop clusters or straight live from the Hadoop clusters. Creating visualization from in-memory data and staging extract of data from Hadoop clusters into relational databases and creating visualizations from the same; both are facilitated by Tableau.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Other analytics vendor like <a href="http://www.snaplogic.com/what-we-do/solutions/big-data.php" target="_blank">Snaplogic</a> and <a href="http://www.pentahobigdata.com/" target="_blank">Pentaho</a> also provides tools for operating with Hadoop clusters, which does not require developers to write code. Microsoft has an integrated platform for integration, reporting and analytics (in-memory/olap) and an IDE like SSDS (formerly BIDS). </span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">If tools similar to Talend and Tableau are integrated into SSIS, SSAS, SSRS, DB Engine and SSDT, then Microsoft is one of the best positioned leaders to take Hadoop to a wide audience in their main-stream business. When platforms like Azure Data Market, Data Quality Services, Master Data Management, StreamInsight, Sharepoint etc join hands with tool and technology support integrated with SQL Sever, it would be an unmatched way to extract intelligence out of Hadoop. Connectors for Hadoop has been the first baby step towards this area. Still lot of maturity in this area is awaited.</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<span style="font-family: Verdana;">Till then look out for existing leaders in this area like Cloudera, MapR, Hortonworks, Apache and GreenPlum for Hadoop distributions and implementation. And for Hadoop tools, software vendors like Talend, Tableau, SnapLogic and Pentaho can provide the required toolset.</span> </div>
</div>
Siddharth Mehtahttp://www.blogger.com/profile/02870225483799389119noreply@blogger.com0