Saturday, May 31, 2014

Elasticsearch Tutorial - Questions - Download Elasticsearch GUI Tools - Part 2

I'm reading: Elasticsearch Tutorial - Questions - Download Elasticsearch GUI Tools - Part 2Tweet this !
Elasticsearch has a very simple installation mechanism. It requires JVM installed on the host OS, and execute elasticsearch.bat file to kick start the same. To consider integrating products like elasticsearch, there are often requirements where front-end tools are required. Some of such requirements are mentioned below:

1) How to import data in bulk from existing data repositories like Excel files, SQL Server, Oracle, MySQL, DB2, MongoDB, and others.

2) How to visually explore data stored in elasticsearch using GUI tools ?

3) How to equip support and monitoring teams with required tools for their day to day operations ?

4) How to equip analyst with tools for executing ad-hoc search queries on data stored in elasticsearch ?

Elasticsearch has a mechanism to support interoperability using a feature called plugins. Plugins can be installed using a simple plugin command. Elasticsearch supports a number of plugins and a huge number of plugins are supported by community. An exhaustive list of such plugins are listed here.

If you are new to Elasticsearch, and just setting up your development environment, below is the list of some of the plugins that you might particularly find useful to speed up your development process.

1) Elasticsearch GUI - A web based elasticsearch administration console written in AngularJS.

2) Elastichead - A web based front-end for elasticsearch, that lets you browse data in a tabular format, provides interface to see metadata, and lets your fire ad-hoc queries.

3) Elasticsearch HQ - A web based elasticsearch monitoring and management console for instances and clusters.

4) Bigdesk - A web based elasticsearch plugin that allows to monitor a huge list of performance counters using charts and graphs.

5) Elasticsearch segmentspy - A web based elasticsearch plugin that specializes in monitoring segment relates features like merges, additions, deletes etc.

6) Elasticsearch whatson A web based elasticsearch plugin that specializes in providing comparative analysis of data stored across indices, shards, nodes and cluster.

7) Elasticsearch FS River Elasticsearch plugin to bulk import content. Though this plugin has got a few bugs open, but still for a one time bulk import it is very useful. With some fixes and workarounds, this plugin can be used to warehouse huge amount of context into elasticsearch.

8) Marvel - A commerical monitoring and analytics tool from elasticsearch.

9) Elasticsearch JDBC River - Elasticsearch plugin to bulk import data from variety of systems into elasticsearch. If wisely used, this is the most useful plugin to start pumping data into elasticsearch.


Monday, May 19, 2014

ElasticSearch Tutorial - Questions - Basics - Part I

I'm reading: ElasticSearch Tutorial - Questions - Basics - Part ITweet this !
1) ElasticSearch uses Apache Lucene as the underlying technology.

2) Relational databases maps the values of fields in a table to indexes. During search operation indexes are used to locate records. Lucene uses inverted indexes that stores values (terms) in a field, which are used to find related records (documents).

Terminologies:

3) What is an index in ElasticSearch ? 

An index is similar to a table in relational databases. The difference is that relational databases would store actual values, which is optional in ElasticSearch. An index can store actual and/or analyzed values in an index.

4) What is a document in ElasticSearch ? 

A document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.

Each field can occur multiple times in a document with different data types. Fields can contain other documents too.

5) Does ElasticSearch have a schema ?

Yes, ElasticSeach can have mappings which can be used to enforce schema on documents.

6) What is a document type in ElasticSearch ?

A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.

7) What is indexing in ElasticSearch ?

The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update is attempted, a new version of the document is written to the index.

8) What is a node in ElasticSearch ?

Each instance of ElasticSearch is called a node. Multiple nodes can work in harmony to form an ElasticSearch Cluster.

9) What is a shard in ElasticSearch ?

Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shards.

10) What is a replica in ElasticSearch ?

Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.

11) What is an Analyzer in ElasticSearch ?

While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is built of tokenizer and filters. Following types of Analyzers are available in ElasticSearch 1.10.
12) What is a Tokenizer in ElasticSearch ?

A Tokenizer breakdown fields values of a document into a stream, and inverted indexes are created and updates using these values, and these stream of values are stored in the document.

13) What is a Filter in ElasticSearch ?

After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.
14) What is the query language of ElasticSearch ?

ElasticSearch uses the Apache Lucene query language, which is called Query DSL.

In the next part of ElasticSearch Tutorial, we would see how to install ElasticSearch, and use ElasticSearch tools and technologies to administer the same.

Sunday, May 18, 2014

World Famous Architectures : Facebook, WhatsApp, Amazon, Twitter, YouTube, Google, ESPN, Salesforce, FarmVille and other world famous architectures

I'm reading: World Famous Architectures : Facebook, WhatsApp, Amazon, Twitter, YouTube, Google, ESPN, Salesforce, FarmVille and other world famous architecturesTweet this !
Experience is the biggest teacher, and no books or coaches can be a better teacher than learning from experience. We often hear from various sources in the professional world around us, regarding different architecture designs and practices, and still most of us would have inevitably attended a performance optimization training at least once in the past 2-3 years.

In my opinion, if you want to really learn scalability and performance, just take a look at the below mentioned top architectures of the world. I bet, if you can follow and implement even any two of them to the extent they have been by these organizations, you are set to build a new world famous architecture.

1) WhatsApp Architecture





















Chef for Microsoft Azure, Amazon, OpenStack, Rackspace, Google Compute Engine, or Linode

I'm reading: Chef for Microsoft Azure, Amazon, OpenStack, Rackspace, Google Compute Engine, or LinodeTweet this !
What is DevOps ? For beginner, who are not aware of what is DevOps, can read this page to gain an idea on the same.

DevOps is a branch of architecture design, that is often considered trivial by many architects or development leads. For traditional applications, infrastructure provisioning, capacity management, monitoring, and operations support generally gets taken care by dedicated IT teams bound by pre-agreed SLAs.

But when architects are dealing with cloud scale applications, devops is no longer a trivial area or outside of the solution definition. Automating infrastructure management using script based templates, on all the major cloud vendors, is one of the standard industry practices and supported by almost all the cloud vendors as well. It came to my surprise when I found one of the Microsoft Azure cloud trainers not aware of what is devops automation, and I had to educate the trainer on the same.

Some of the major players in this area are as mentioned below, and Chef is leading the way in this area.
Vagrant still stands a choice for VMWare lovers, but Chef is much more sophisticated compared to Vagrant. A good place for beginners is to start getting an overview of Chef, and pursuing some free webinars and free trainings provided by Chef. Chef in itself is a comprehensive framework with concepts like Knife, Cookbook, Chef-repo, Ohai etc. A picture is worth thousand words.Below mentioned is the architecture diagram of Chef.

Chef Architecture

Picking up a cloud automation vendor is not the end of  devops. Often huge businesses have hybrid infrastructure environments formed of private datacenter, physical and virtual environments, multi-tenant cloud environments. Companies like RightScale too offer specialized solutions to deal with such use-cases. Below is an interesting architecture diagram of righscale solution model.


RightScale MultiCloud Platform

Friday, May 16, 2014

How to drive a project on NoSQL, Big data, Elasticsearch, MongoDB, Hadoop, and other such technologies

I'm reading: How to drive a project on NoSQL, Big data, Elasticsearch, MongoDB, Hadoop, and other such technologiesTweet this !
I am authoring this blog after quite a long break from blogging. Once one gets married, promoted in the organization at the same time, and made responsible for more than 20+ projects as the Lead Architect for a portfolio, it's not easy to catch up with blogging. 

These days, I work on projects spanning technologies like Sharepoint 2013, .NET, jQuery, SQL Server, SSIS, SSAS, SSRS, Powerpivot, Powerview, Mobile web apps using Bootstap and jQueryMobile, Native apps using iOS xCode, and NoSQL based technologies like Elasticsearch and MongoDB. Working as a solution architect with a broad range of projects and technologies is like working as a chef in a kitchen. I get to mix and merge various technology combinations, to create various solution recipes that cater to project requirements. The only exception is bad recipes are not tolerated easily as significant cost is involved based on a architect's decision.

I have spent my career working with technologies that were predominantly from Microsoft space. But the world is changing, and so are the focus on technologies. I have been taking a lot of personal interest in studying more on the NoSQL based technologies that can tap intelligence from unstructured data as well as big data.

One of the biggest traits that many developer or architect generally have is the typical punch line "I can't learn by reading, I need hands-on experience of the technology I need to manage". If you are working with a multi-national organization, it's not that easy to land into a project where neither you would have an experience in the driving technology, and in most cases neither the organization would have any experience too. When organizations don't find or recognize use-cases for any particular technology, if you try to push or propose the technology, it would be seen as you are trying to sell the technology and it's a solution in search of a problem. 

So the big question is, how to bag an entry ticket into the NoSQL world and drive a project using NoSQL technologies ?

Some of the initiatives that can help professionals seeking to build competency in NoSQL as well as intending to drive NoSQL based projects, can consider the following points:

1) Setup a personal lab: Virtualization has made is easy to create a VM. Most of the NoSQL technologies require very modest resources (like 2 GB RAM and single core), to run the software. This can be a starting playground to start practicing the technology.

2) Join the global community: Platforms like Github and Stackoverflow have lot of community projects and real-life questions. By being an active observer as well as participant of these platforms, one can mature on the technology very fast as well as make oneself globally visible as an active professional in the technology of choice.

3) Create a community within your organization: Organizations feel comfortable in adopting technologies, which can be easily managed by the pool of people available in the organization. If you one of the few ones having grip on the technology, you may classify yourself in the niche bracket, but that does not increase organizations confidence to deal in the technology. To deal with this issue, you should conduct various awareness sessions to bring people are various levels up to speed with technology, and create a community of practice in the organization.

4) Pursue a professional training: Post you have been able to successfully pursue points 2 and 3, you can confidently ask for a budget from the organization to pursue professional training on the subject. Everyone's pocket might not allow to pursue training from one's own pocket !!

5) Develop and publish POCs: Confidence to adopt a technology and confidence in a professionals ability to manage a technology, is reflected by the professionals ability to justify the use-case for technology. Identifying use-cases and justifying through POCs are the best means for the same.

By following these 5 steps, I believe that one can establish oneself as well as one's organization in a position to make an entry in the NoSQL world. Let me know what you think.
Related Posts with Thumbnails