What is the difference between logs and events in observability? These two telemetry data types are used for different purposes when it comes to exploring your applications and how your users interact with them. Simply put, logs can be used for troubleshooting and root cause analysis, while events can be used to gain deeper application insights via product analytics.
Let’s review some application telemetry data definitions for context, then dive into the key differences between logs and events and their use cases. Knowing more about these telemetry data types can help you more effectively use them in your observability strategy.
MELT: Defining the four pillars of telemetry data
Depending on who you ask, there are either three or four pillars of observability: Logs-Metrics-Traces or Metrics-Events-Logs-Traces (MELT). The key difference is the inclusion of events as a different data type.
When viewed collectively, these four types of telemetry data can help DevOps teams understand what’s happening in their applications and infrastructure, and even make decisions about how to improve their products.
Let’s focus on MELT and define each data type for the purpose of this article.
Metrics are measurements that typically share the following traits:
- A name
- A timestamp
- One or more numeric values
Event telemetry involves discrete actions that happen at a specific moment in time. Think of events as a history of what’s happened on your system, which can be helpful when analyzing user behaviors and more.
The original data type, logs are lines of text a system produces when certain code gets executed.
Logs can be structured or unstructured, however using structured logs in a format like JSON or XML makes it easier to query logs for troubleshooting or root cause analysis.
Also called distributed traces, traces are samples of causal chains of events (or transactions) between different components in a microservices ecosystem.
Events vs. logs - what’s the difference?
Many people confuse events with structured logs. That’s because all events can be represented as structured logs, but not all structured logs are events. Unlike logs, events describe a unit of work, meaning they contain all of the information about what it took for a service to perform a certain job. And, not every log is an event in its entirety. Logs are usually only portions of events. A group of logs can compose a single event.
Think of events as a subset of logs. Many teams want to analyze them with a greater granularity to understand how users interact with their applications. For example, query stats within events can help you understand things like latency, segments, latency per step, which user took a specific action, and more. Each of these query stats can be represented by a field with a number. DevOps teams can aggregate this data to gain a deeper understanding of applications and user behaviors.
Use Cases: when to use event analytics or log analytics?
Now that we’ve defined these two data types, why are they different when it comes to use cases? At the most simple level: Log analytics are often used for troubleshooting, while events can be leveraged for product and user insights. As one example, query stats can give you information about the latency of each query in your system as it pertains to each user. Examining these types of details can show how happy your customers are with your product’s performance.
Many organizations use a solution like ChaosSearch to perform both search and relational analytics on the same set of data. ChaosSearch enables you to store logs and events without retention limits, by taking advantage of low-cost cloud object storage such as Amazon S3. By retaining telemetry data longer, you can examine trends and monitor usage over time.
While analyzing events may sound intuitive, in a cloud-native environment, it’s hard to do. With event analytics, retention matters. Many observability systems limit data retention windows to 30 days or less, which makes it difficult to see trends, such as how product usage has changed over time. In addition, good product analytics require near real-time access to data. Using SQL joins in ChaosSearch, you can correlate data in near real-time to understand how certain datasets relate to one another.
The overall goal is to allow DevOps teams to elevate operational telemetry to business-level insights. For many software-powered companies, this data can be critical to the user experience. Here are a few examples:
- One gaming company is using telemetry data for game balancing. When players die, the DevOps team can analyze what items were they wearing, as well as what they were doing before and after to understand which characters are too powerful or too weak.
- A fintech company is using a similar event analytics strategy to understand which events are application- and transaction-related. The DevOps team records events issued by Lambdas to find out the number of transactions per merchant and the number of orders across merchants for customers.
Leveraging ChaosSearch SQL API for events
Built for scale, ChaosSearch lets you centralize large volumes of logs and events, and analyze them via Elastic or SQL APIs — at a fraction of the cost of an observability solution such as Datadog or Splunk. For teams that don’t want to disrupt their metric and trace observability platforms when they’re working well, ChaosSearch can integrate easily using open APIs for unified observability. These teams can:
- Send logs and events directly to Amazon S3 or Google Cloud Storage (GCS): Send log and event data directly from the source, or ingest it into another observability tool and use S3/GCS as the destination.
- Connect to ChaosSearch: Grant ChaosSearch read-only access to the raw log buckets. From there, teams can create a new bucket for Chaos Index® to make their data fully searchable, or create a few object groups and views.
- Analyze logs and events via Elastic or SQL APIs: Investigate applications in the ChaosSearch console via Kibana, Superset, Elastic or SQL APIs.