Elasticsearch is a very powerful value addition to any relational dbms like SQL Server, Oracle, DB2 etc, provided it's used wisely. Before we look at how to use elasticsearch with SQL Server, we should look at "Why to use elasticsearch with SQL Server". This question holds the key to the answer.
SQL Server hold data either in relational form or in multi-dimensional form (through SSAS). Full Text Search (FTS) in SQL Server is capable of providing some out-of-box search feature, but when search queries requires exhaustive searching over huge datasets, and add some complexity in the search definition itself, one can evidently see performance impact there. Elasticsearch is primarily a search engine, but loaded with features like Facets and Aggregation framework, it helps solve many data analysis related problems. For example, everyone of us would have visited sites like Amazon.com, Ebay.com, Flipkart.com etc. Whenever we search for a product, it builds all the dynamic categories, ranges and values on the fly. For such features, a product like elasticsearch can be extremely helpful. One such real project example can be read from here.
How to use Elasticsearch with SQL Server ?
Elasticsearch JDBC River is the best means (to the best of my knowledge as of date) to load data from SQL Server into an elasticsearch index. One of the best explanations on setting up elasticsearch JDBC river with SQL Server, can be read from here.
One point to keep in view is that, if you setup a river and you restart elasticsearch server, the river would execute the query set for the river again. This could result in reloading of the entire data in the index. In case if the IDs are being fetched from the source, all existing records would get updates. But if IDs are autogenerated in elasticsearch, this would result in new records, which would ultimately lead to duplicate data. So use the river cautiously. You can also delete the river once data is loaded into the index, in case its a one time activity for one time data migration.
2 comments:
Even Sphinx works fast and is very easy to configure. It can be explored.
However Elastic Search provides more options compared to Sphinx.
Elastic Search Rivers are now Deprecated.
There is now a JDBC Feeder and JDBC Importer tool which "replace" the rivers.
Not all the same. Feeder isn't working well in Windows yet.
Post a Comment