Operations | Monitoring | ITSM | DevOps | Cloud

Prometheus native histograms in Grafana Cloud: More precise, easier to use, and better compatibility

Histograms help you monitor and visualize the distribution of values for key metrics, such as response times or request sizes of a service. They’re frequently used to gain insights into data patterns, anomalies, and trends, making them an important tool for observability.

Monitor the full end-user experience: k6 browser checks in Synthetic Monitoring are generally available

We continue to evolve Grafana Cloud Synthetic Monitoring to help you simulate even the most complex transactions and user journeys, and proactively monitor the performance of your web applications and APIs. In line with this effort, we’re excited to share that k6 browser checks in Synthetic Monitoring are now generally available.

How to keep Ingress NGINX Controller metric volumes manageable and still meaningful

The Ingress NGINX Controller is a widely used Kubernetes component for managing HTTP and HTTPS traffic routing. While it provides powerful observability through Prometheus metrics, it’s also notorious for generating an excessively high number of time series. The root cause lies in how the controller labels its metrics—tracking requests across multiple dimensions such as ingress name, host, path, status code, and upstream response times.

Introducing the Causely data source plugin for Grafana

Endre Sara is a Co-Founder of Causely, where he’s building a causal reasoning platform to continuously assure service reliability and eliminate human troubleshooting. Previously, Endre was VP of Advanced Engineering at Turbonomic and a VP at Goldman Sachs. At Causely, we believe observability tools shouldn’t just collect more data—they should enable you to understand it.

How to get started with frontend observability: A quick Grafana Faro example

Modern cloud-native applications and web browsers are highly complex, making it challenging to gain visibility into their performance. Without an effective way to track and measure frontend performance, it becomes difficult to monitor real user experiences, detect critical issues, assess website health, and ensure optimal functionality. But what if you could see exactly what your users are experiencing in real time?

New in Adaptive Logs: user-facing temporary pauses, exemptions, and per-service recommendations

We launched Adaptive Logs last year to help you optimize your log volumes and costs in Grafana Cloud, and we’ve been hard at work ever since making improvements based on your feedback. Over the past couple of months, we’ve delivered several new features to help reduce toil, apply recommendations with precision, and—what we’re most excited about—confidently optimize your log ingestion while still providing peace of mind to your end users!

How a cooking platform whipped up a new observability plan with Grafana Cloud

As any good cook knows, if you want to create a top-notch dish, you have to use the best ingredients. So when the engineering team for Cookidoo — an online platform and app that features more than 80,000 guided recipes for the Thermomix, an all-in-one kitchen small appliance — realized the observability tool they were using to monitor the platform wasn’t delivering what they needed, they decided to switch to Grafana Cloud and OpenTelemetry.

Grafana Cloud updates: new testing features in Grafana Cloud k6, enhanced troubleshooting in Kubernetes Monitoring, and more

We consistently roll out helpful updates and fun features in Grafana Cloud, our fully managed observability platform powered by the open source Grafana LGTM Stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). In case you missed them, here’s our monthly round-up of the latest and greatest Grafana Cloud updates.

AWS Lambda, OpenTelemetry, and Grafana Cloud: a guide to serverless observability considerations

In our increasingly serverless world, observability isn’t just a “nice to have”—it’s essential. Serverless functions such as AWS Lambda bring incredible benefits, but they also introduce complexities, especially around monitoring and debugging. In a previous article, I provided a quick, practical guide for sending AWS Lambda traces to Grafana Cloud using OpenTelemetry.

Why you should embrace more incidents (seriously!)

We’re all looking for ways to improve on our incident response. We investigate various metrics and methodologies—all in the name of making sure our customers see the reliable and performant systems we’ve sought to build. In fact, all these efforts are leading us, as an industry, to finally realize the power of surprising anomalous events in our systems. They give us an opportunity to reexamine our expectations and see how our models of the sociotechnical system differs from reality.