Entity extraction is the process of figuring out which fields a query should target, as opposed to always hitting all fields. For example: how to tell, when the user typed in Apple iPhone, that the intent was to run company:Apple AND product:iPhone?
Over the years, natural language processing, in the world of search, went from interesting detail to a must have, especially in areas such as e-commerce. Engineers started incorporating classification, synonym generation, named entity recognition and much more into their search systems giving users better search results and in some cases leading to more revenue.
Lucene has a lot of options for configuring similarity. By extension, Solr and Elasticsearch have the same options. Similarity makes the base of your relevancy score: how similar is this document (actually, this field in this document) to the query? I’m saying the base of the score because, on top of this score, you can apply per-field boosts, function scoring (e.g. boost more recent documents) and re-ranking (e.g. Learning to Rank).
During the Entity Extraction For Product Searches talk that Radu Gheorghe and I gave at Activate conference in Montreal last year, we talked about various natural language processing and machine learning algorithms. We showed entity extraction both on top of Solr and using external libraries. In this post we dig into Learning to Rank with Solr Streaming Expressions.
The search-first problem-solving approach—meaning “open up the log search tool” (Splunk, ELK, Loggly, SumoLogic, Scalyr, etc)—is a costly and time-consuming operation during which the true source of a problem is rarely pinpointed in short order. Log search tools require work by the user to transform text strings into fields that are ready for statistical analysis.
We’ve been working with Elasticsearch since its inception, either with clients on consulting for Elasticsearch products and Elasticsearch production support, or by building our own hosted log management solution. For the last 4 years, we’ve also been sharing our knowledge through Elasticsearch training classes. In 2018, we had remote public training classes on a fixed quarterly schedule, so you can more easily plan your learning time and budget.
If you rely on Elasticsearch for centralized logging, you cannot afford to experience performance issues. Slow queries, or worse — cluster downtime, is not an option. Your Elasticsearch cluster needs to be optimized to deliver fast results.