Where did all my spans go? A guide to diagnosing dropped spans in Jaeger distributed tracing

Nothing is more frustrating than feeling like you’ve finally found the perfect trace only to see that you’re missing critical spans. In fact, a common question for new users and operators of Jaeger, the popular distributed tracing system, is: “Where did all my spans go?” In this post we’ll discuss how to diagnose and correct lost spans in each element of the Jaeger span ingestion pipeline.


Grafana and NGINX are partnering to give the open source community a turnkey experience for visibility

Over the past few years, NGINX users have naturally gravitated toward Grafana, and vice versa. These days, it’s not uncommon to see these two open source tools used together in the wild. And for good reason. F5, which acquired NGINX last year, is prioritizing building visibility across the entire product set, to make it easy for customers to quickly gain the insights that they need. Meanwhile, Grafana has evolved into the primary visualization and analysis tool in the open source market.

Grafana Loki sneak peek: Generate Ad-hoc metrics from your NGINX Logs

Get a sneak preview of a future version of Grafana Loki that enables you to generate ad-hoc metrics from your log data. This video features a Loki-based web analytics dashboard, which uses the access logs of the popular open-source web server NGINX. Every panel on this dashboard uses ad-hoc metrics created with Loki, well, besides the Log panel obviously. Would this be useful for your use-case? Let us know in the comments.

How to Build A Unified Dashboard | Datadog Tips & Tricks

In this video, you’ll learn how to create unified dashboards to enable your teams with valuable information and performance visualizations from across the Datadog platform. Dashboards allow your teams to see all data from across the Datadog platform side-by-side, enabling holistic visibility and breaking down silos between Dev and Ops teams. In this video, you’ll learn how to create a Screenboard, showcasing data such as frontend system latency, backend system latency, and Service Level Objectives all in one place.

New Enterprise features in Grafana 7.0: Usage insights and user presence indicator

Dashboard sprawl is a real problem whether you’re using Grafana or any other tool. When growing to thousands of users – and as many dashboards – you’ll eventually want more information about how the tool is being used in your organization. After all, dashboards don’t help anyone if they aren’t being used. Managing large installations is one of the areas where Grafana Enterprise improves Grafana, and our launch of usage insights in 7.0 is a key part of that.


Custom Dashboards for Business and IT Metrics by Instana

Putting metrics and charts on custom dashboards has been an important part of IT and business operations from the earliest days of monitoring. That’s not to say nothing has changed in the past 30-40 years. To the contrary, the concept of creating and maintaining hundreds of dashboards is outdated and we know there are much better ways to implement a dashboarding strategy. The key is to create as few dashboards as needed, with as much automation built in for managing those dashboards.


Getting started with the Grafana Cloud Agent, a remote_write-focused Prometheus agent

Hi folks! Éamon here. I’m a recent-ish addition to the Solutions Engineering team and just getting my feet wet on the blogging side so bear with me. :) Back in March, we introduced the Grafana Cloud Agent, a remote_write-focused Prometheus agent. The Grafana Cloud Agent is a subset of Prometheus without any querying or local storage, using the same service discovery, relabeling, WAL, and remote_write code found in Prometheus.


Kibana platform migration: Lessons in large scale cross-team collaboration

When Kibana 4.0 was created back in 2015, it only had three apps: Dashboard, Visualize, and Discover. Fast forward five years, Kibana now consists of 100+ plugins, millions of lines of code, thousands of dependencies, and dozens of frameworks. The architecture of Kibana that worked well with three apps had become a bottleneck that was hindering Kibana’s stability, scalability, performance, and development velocity.


Why optimizing for MTTR over MTBF is better for business

The classic debate when running a software as a service (SaaS) business is between release frequency vs. stability and availability. In other words, are you Team MTTR (mean time to recovery) or Team MTBF (mean time between failure)? In this blog post, I argue for MTTR, which encourages you to push more frequently, embrace the instability this may introduce, and invest in training and tooling to deal with the pursuing outages.