Lucene has a lot of options for configuring similarity. By extension, Solr and Elasticsearch have the same options. Similarity makes the base of your relevancy score: how similar is this document (actually, this field in this document) to the query? I’m saying the base of the score because, on top of this score, you can apply per-field boosts, function scoring (e.g. boost more recent documents) and re-ranking (e.g. Learning to Rank).
During the Entity Extraction For Product Searches talk that Radu Gheorghe and I gave at Activate conference in Montreal last year, we talked about various natural language processing and machine learning algorithms. We showed entity extraction both on top of Solr and using external libraries. In this post we dig into Learning to Rank with Solr Streaming Expressions.
The search-first problem-solving approach—meaning “open up the log search tool” (Splunk, ELK, Loggly, SumoLogic, Scalyr, etc)—is a costly and time-consuming operation during which the true source of a problem is rarely pinpointed in short order. Log search tools require work by the user to transform text strings into fields that are ready for statistical analysis.
We’ve been working with Elasticsearch since its inception, either with clients on consulting for Elasticsearch products and Elasticsearch production support, or by building our own hosted log management solution. For the last 4 years, we’ve also been sharing our knowledge through Elasticsearch training classes. In 2018, we had remote public training classes on a fixed quarterly schedule, so you can more easily plan your learning time and budget.
If you rely on Elasticsearch for centralized logging, you cannot afford to experience performance issues. Slow queries, or worse — cluster downtime, is not an option. Your Elasticsearch cluster needs to be optimized to deliver fast results.
This tutorial walks through using Rancher to deploy Elasticsearch into a Kubernetes cluster. At the end of this article, you will have a fully functional 2-node Elasticsearch cluster, complete with sample data and examples of successful queries.
We’re excited to announce the release of the Field Stats API plugin for Elasticsearch. The Field Stats API used to be present from Elasticsearch 1.6 to 5.6, to provide efficient statistics for fields of each index. For example, the minimum and maximum values of a date field.
In Elasticsearch parlance, a document is serialized JSON data. In a typical ELK setup, when you ship a log or metric, it is typically sent along to Logstash which groks, mutates, and otherwise handles the data, as defined by the Logstash configuration. The resulting JSON is indexed in Elasticsearch.
Elasticsearch is a highly scalable, distributed, open-source RESTful search and analytics engine that offers log analytics, real-time application monitoring, click stream analytics, and more. Elasticsearch stores and retrieves data structures in real time. It has multi-tenant capabilities with an HTTP web interface, presents data in the form of structured JSON documents, makes full-text search accessible via RESTful API, and maintains web clients for languages like PHP, Ruby, .Net, and Java.