Siddharth Mehta's Blog: ElasticSearch Tutorial - Questions - Basics

Monday, May 19, 2014

ElasticSearch Tutorial - Questions - Basics - Part I

1) ElasticSearch uses Apache Lucene as the underlying technology.

2) Relational databases maps the values of fields in a table to indexes. During search operation indexes are used to locate records. Lucene uses inverted indexes that stores values (terms) in a field, which are used to find related records (documents).

Terminologies:

3) What is an index in ElasticSearch ?

An index is similar to a table in relational databases. The difference is that relational databases would store actual values, which is optional in ElasticSearch. An index can store actual and/or analyzed values in an index.

4) What is a document in ElasticSearch ?

A document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.

Each field can occur multiple times in a document with different data types. Fields can contain other documents too.

5) Does ElasticSearch have a schema ?

Yes, ElasticSeach can have mappings which can be used to enforce schema on documents.

6) What is a document type in ElasticSearch ?

A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.

7) What is indexing in ElasticSearch ?

The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update is attempted, a new version of the document is written to the index.

8) What is a node in ElasticSearch ?

Each instance of ElasticSearch is called a node. Multiple nodes can work in harmony to form an ElasticSearch Cluster.

9) What is a shard in ElasticSearch ?

Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shards.

10) What is a replica in ElasticSearch ?

Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.

11) What is an Analyzer in ElasticSearch ?

While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is built of tokenizer and filters. Following types of Analyzers are available in ElasticSearch 1.10.

12) What is a Tokenizer in ElasticSearch ?

A Tokenizer breakdown fields values of a document into a stream, and inverted indexes are created and updates using these values, and these stream of values are stored in the document.

13) What is a Filter in ElasticSearch ?

After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.

14) What is the query language of ElasticSearch ?

ElasticSearch uses the Apache Lucene query language, which is called Query DSL.

In the next part of ElasticSearch Tutorial, we would see how to install ElasticSearch, and use ElasticSearch tools and technologies to administer the same.

Siddharth Mehta's Blog

Monday, May 19, 2014

ElasticSearch Tutorial - Questions - Basics - Part I

No comments:

Latest Trends and Technologies

Elasticsearch Resources

Hadoop, BIG Data, and Cloud

Read My Articles

Microsoft Business Intelligence

SQL Server Product Team Blogs

Community

MS BI 2008 Whitepapers

Article Category

MS BI 2008 Video Tutorials

Blog Archive