Enterprise Search is a huge market. Fortunately there are just a handful of products out there to cater this business and unfortunately there is no one-product-fits-all kind of product out there.
There are specific category of features expected from an enterprise search product, which makes it suitable for one or other requirements. Some of them are listed as below:
1) Crawling
- Web Crawling: An enterprise has most of the content on portals in the form of html and media documents. A crawler is the basic means to create an index out of this content.
- DB Crawling: Data stored in databases often needs to be crawled or imported into the search inventory.
2) Taxonomy
Taxonomy is the logical organization of content in the enterprise content management system. Some term it as metadata or structure or term stores of the index maintained in the system. It's the method of framing structure around the content, so that information can be retrieved more effectively and precisely.
For example, a very simple way of implementing taxonomy can be the ability to tag content using a set of keywords defined centrally at the organization level.
3) Specialized OOB Search
- Faceted search (like the ones when you use Amazon and a set of categories appear of the left side)
- Dictionary based search (where you look for a word and its synonyms)
- Auto-suggest (for example when you type terms in google and it suggest few phrases)
4) Plugability
- Ability to index SMTP server
- Ability to index LDAP server
- Out-of-box ability to index any such external systems
But the big question is where does products like Elasticsearch fit here ?
While we looked at the positives of these products due to their ability to provide the above mentioned features, there are some downsides / limitations too, where Elasticsearch or even Solr steps in.
1) Any of these products are not economic. For example, HP Autonomy is heard to have the base price of more than half a million dollars. Every enterprise may not have the budget to afford it.
2) Some products do not support database indexing easily. For example GSA does not allow to use complex delta detection based queries for indexing data from databases easily.
3) Most of these products are not scalable horizontally. Apart from appliance solutions, products like endeca are resource intensive and not suitable for managing big data kind of volumes due to their scalability architecture.
4) Custom development for extending the product using APIs is not as easy as compared to open source products.
Custom search for applications is inevitable. Though the enterprise search platform may be dominated by these products, but for empowering custom applications that manage big data using specialized search functionality (for example ecommerce sites like amazon.com and others), products like elasticsearch and solr would continue to find its space.
The limitations with products like Elasticsearch is that it lacks the enterprise scale features for example OOB Crawlers, Information Visualization and Reporting layers required for e-discovery and reporting, and very limited taxonomy which is very crucial for an enterprise search platform. But as the product is still very young and evolving, these features can be expected hopefully over the couple of years.