Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Introducing Grafana OnCall OSS, on-call management for the open source community

Last November, we announced the launch of Grafana OnCall, an easy-to-use on-call management tool that helps reduce toil through simpler workflows and interfaces tailored for developers. Born out of Grafana Labs' acquisition of Amixr Inc., Grafana OnCall began as a cloud-only solution that became generally available to all Grafana Cloud users, on both paid and free plans, in February.

Grafana Alerting: Explore our latest updates in Grafana 9

Grafana 8 marked a major redesign in the way we do alerting. We created a unified alerting experience that implemented a workflow that operates across all of our products and combined Grafana panel alerts and Prometheus-style alerts into a single pane of glass. We built this as an open source feature first to make sure you could opt in and try it out from day one, regardless of which flavor of Grafana (OSS, Cloud, or Enterprise) works best for you.

Grafana 9.0: Prometheus and Grafana Loki visual query builders, new navigation, improved workflows, heatmap panels, and more!

GrafanaCONline, our annual community event designed for Grafana open source users and dashboarding enthusiasts, also marks the general availability of Grafana’s latest and greatest release. Grafana 9.0 is now available to both open source and Grafana Enterprise users, and is being rolled out to Grafana Cloud users incrementally. (The majority of instances have already been upgraded!) New Grafana Cloud users will immediately get the Grafana 9.0 experience.

GrafanaCONline 2022: A guide to all the big announcements from Grafana Labs

We have lift off! GrafanaCONline 2022 officially launched today with the opening keynote featuring Grafana Labs CEO and Co-founder Raj Dutt, Chief Grafana Officer and Co-founder Torkel Ödegaard, and Senior Engineering Manager Myrle Krantz. Along with previewing the much-anticipated release of Grafana 9.0, we revealed some exciting news for our open source community. Below is a summary of all the major headlines that mark one small step for Grafana, one giant leap for the Grafana community.

Netdata Agent release v1.35

The latest Netdata Agent release v1.35 introduces massive improvements for the machine learning-powered Anomaly Advisor, Metric Correlations, Kubernetes monitoring, and much more. Anomaly Advisor & on-device Machine Learning This release features a launch of the flagship machine learning (ML) assisted troubleshooting Anomaly Advisor. Unsupervised ML models are trained for every metric, at the edge, on your devices, enabling real-time anomaly detection across all your systems and applications.

Recapping SLOconf 2022: SLOs are for everyone!

Did you get to attend the excellent SLOconf last month? With four different tracks and over 60 talks - covering everything from defining an SLO to the financial framing of error budgets, you, like us, may have missed a couple of things. In this handy recap, we take you through some of the juiciest sessions and point you to a few you may have overlooked. Luckily, SLOconf 2022 was designed for while-you’re-working participation and all the talks are still available.

OpenObservability Talks Second Year at a Glance

I can’t believe that OpenObservability Talks podcast is already celebrating its second anniversary. It feels like just yesterday I wrote the summary of the summary of the first year, sharing the hectic times of starting a podcast in the midst of the COVID-19 global pandemic. The pandemic has been with us most of this year too, but it didn’t stop us from bringing the latest on the best of breed open source observability.

Outage in Egypt impacted AWS, GCP and Azure interregional connectivity

On Tuesday, June 7, internet users in numerous countries from East Africa to the Middle East to South Asia experienced an hours-long degradation in service due to an outage at one of the internet’s most critical chokepoints: Egypt. Beginning at approximately 12:25 UTC, multiple submarine cables connecting Europe and Asia experienced outages lasting over four hours. As I show below, the impacts were visible in various types of internet measurement data to the affected countries.

Using High Availability Capabilities to Make Migration of the Monitoring System Simple

A monitoring tool and its backend database Monitoring platforms such as eG Enterprise collect large numbers of metrics and data points about the applications and infrastructure being monitored. As the complexity of the applications, the number of tiers and the scale of the infrastructure grows, so do the number of metrics that need to be analyzed. Even in a mid-sized IT infrastructure, there may be over 100s of thousands of metrics collected and analyzed over time.