Operations | Monitoring | ITSM | DevOps | Cloud

The Ops Agent is now GA and it leverages OpenTelemetry

Running and troubleshooting production services requires deep visibility into your applications and infrastructure. While basic logs and metrics are available out of the box with Google Cloud Compute Engine (GCE), capturing advanced data used to require the installation of both a metrics agent and a logging agent.

An Introduction to OpenTelemetry

The growth of technology has led to more efficient and relevant digital experiences, and customers continue to expect more out of those interactions. That’s true no matter their location and no matter which device they choose to use. Companies that cannot provide these kinds of personalized interactions for their customers find themselves falling behind the competition as technology continues to advance.

How Slack Transformed Their CI With Tracing

Slack experienced meteoric growth between 2017 and 2020—but that level of growth came with growing pains. In his talk at the 2021 o11ycon+hnycon, Frank Chen (LinkedIn), a Slack Senior Staff Engineer, detailed one of Slack’s biggest pain points in that period: flaky tests. A flaky test returns both a passing and failing result despite no changes in the code. At one point, between 2017 and 2020, Slack’s flaky test rate reached as high as 50%.

What's new in Grafana Cloud for July 2021: Traces, live streaming, Kubernetes and Docker integrations, and more

If you’re not already familiar with it, Grafana Cloud is the easiest way to get started observing metrics (Prometheus and Graphite), logs (Grafana Loki), traces (Grafana Tempo), and dashboards. Here are the latest features you should know about!

Distributed Tracing for Kafka Clients with OpenTelemetry and Splunk APM

This blog series is focused on observability into Kafka based applications. In the previous blogs, we discussed the key performance metrics to monitor different Kafka components in "Monitoring Kafka Performance with Splunk" and how to collect performance metrics using OpenTelemetry in "Collecting Kafka Performance Metrics with OpenTelemetry." In this blog, we'll cover how to enable distributed tracing for Kafka clients with OpenTelemetry and Splunk APM.

Instrumenting Java Applications for Tracing with OpenTelemetry and Jaeger

The aim of this article is to demonstrate how you can instrument a Java application using Opentelementry and Jaeger. In this example, we will be instrumenting our Java application using OpenTelemetry and the OpenTelemetry Java client, and the tracing data will be exported and visualized using Jaeger. We will use the Logz.io Jaeger backend as it is compatible with common tracing standards like Zipkin, OpenTelemetry, and OpenTracing.

Real-time distributed tracing for Go and Java Lambda Functions

Serverless applications streamline development by allowing you to focus on writing and deploying code rather than managing and provisioning infrastructure. To help you monitor the performance of your serverless applications, last year we released distributed tracing for AWS Lambda to provide comprehensive visibility across your serverless applications.

Instrumenting Microservices with Istio for Distributed Tracing

Previously, I wrote a Beginner’s Guide to Jaeger + OpenTracing Instrumentation for Go providing guidance on manually instrumenting Go services. This is useful for cases where we want fine-grained tracing of specific functions. However, what if all we want is to trace a service’s inbound and outbound calls with little to no additional code?

Find the Root Cause Faster with Trace View and Trace Navigator

Like a bratty teenager, traditional monitoring answers your questions, but does so in a terse, unhelpful manner: Why is my page slow? Guess it’s the API call. It’s a 504 thing — you wouldn’t understand. Ok, so why is the API call slow? Ask your DB query. Gosh! You need a better conversation with your code — one which gives you contextual clues about your application’s performance.

How to Monitor Application Logs

In the beginning, there was the Log – or to be a bit more precise, there were application logs. At least that's how it was in the early days of application development, when raw log data itself was more often than not the point where troubleshooting began. Now, of course, the starting point for troubleshooting with cloud-based applications is much more likely to be an automatically-generated alert, or an indication on a monitoring dashboard that something isn't quite right.