Operations | Monitoring | ITSM | DevOps | Cloud

How we use metamonitoring Prometheus servers to monitor all other Prometheus servers at Grafana Labs

One of the big questions in monitoring can be summed up as: Who watches the watchers? If you rely on Prometheus for your monitoring, and your monitoring fails, how will you know? The answer is a concept known as metamonitoring. At Grafana Labs, a handful of geographically distributed metamonitoring Prometheus servers monitor all other Prometheus servers and each other cross-cluster, while their alerting chain is secured by a dead-man’s-switch-like mechanism.

Using Telegraf plugins to visualize industrial IoT data with the Grafana Cloud Hosted Prometheus service

One of the biggest challenges with data visualization for complicated software systems is getting quick access to the underlying data and connecting it to some form of cloud-hosted solution. Traditionally it has required quite a bit of middleware and upfront setup with additional tooling.

You should know about... these useful Prometheus alerting rules

Setting up Prometheus to scrape your targets for metrics is usually just one part of your larger observability strategy. The other piece in the equation is figuring out what you want your metrics to tell you and when and how often you should know about it. Thankfully, Prometheus makes it really easy for you to define alerting rules using PromQL, so you know when things are going north, south, or in no direction at all.

Intro to exemplars, which enable Grafana Tempo's distributed tracing at massive scale

Exemplars are a hot topic in observability recently, and for good reason. Similarly to how Prometheus disrupted the cost structure of storing metrics at scale beginning in 2012 and for real in 2015, and how Grafana Loki disrupted the cost structure of storing logs at scale in 2018, exemplars are doing the same to traces. To understand why, let’s look at both the history of observability in the cloud native ecosystem, and what optimizations exemplars enable.

Want to visualize software development insights with Grafana? With our new Jira Enterprise plugin, you can!

A very fun part of my job as a Solutions Engineer at Grafana Labs is getting to learn the ins and outs of a new feature or play with a plugin while it is still in development. So, when I heard murmurs that our latest Enterprise plugin would be an integration with Jira, I felt the forsaken call of the agile sirens luring me back to my days when I worked as a technical writer on a product team.

How we're graduating Grafana Agent experiments into the official Prometheus project

We’ve been experimenting with new ways to use and operate Prometheus over the past year. Every successful Grafana Agent experiment turns into an upstream contribution for the whole Prometheus community to benefit from. In this blog post, I go over the history of the Agent’s successful — and not so successful — experiments.

Grafana 7.5 released: Loki alerting and label browser for logs, next-generation pie chart, and more!

Grafana v7.5 has been released! This is the last stable release before we launch Grafana 8.0 at GrafanaCONline in June. Register for free now, so you won’t miss the great sessions we’re planning around all things Grafana. And if you’re doing something special with Grafana that you’d like to share with the community, the CFP for GrafanaCONline is open until 06:59 UTC on April 10! Now, back to 7.5.

2021: The year of Cortex for IoT?

My Grafana Labs colleague RichiH recently talked about why IoT and time series databases work so well together. It just so happens that we have a highly scalable time series database on hand. Let’s talk about that. My name is Goutham, and I am a maintainer for Cortex. I have been working on it for nearly three years out of the four-and-a-half years the project has existed. Cortex is built to serve as a scalable, long-term store for Prometheus.

How I fell in love with logs thanks to Grafana Loki

As part of my job as a Senior Solutions Engineer here at Grafana Labs, I tend to pretty easily find ways out of technical troubles. However, I was recently having some Wi-Fi issues at home and needed to do some troubleshooting. My experience changed my whole opinion on logs, and I wanted to share my story in hopes that I could open up some other people’s eyes as well. (I originally posted a version of this story on my personal blog in January.) First, some background info.

What's new in Grafana Cloud for March 2021: improvements to alerting, synthetic monitoring, and more

As the product manager for Grafana Cloud, I am constantly following the progress of all the new features that our engineering teams are working on, from early ideation to release. We’re always excited to share updates with the community, so you can all try them out and let us know what you think. So each month, I’ll be rounding up the latest Grafana Cloud features and improvements on the blog.