Operations | Monitoring | ITSM | DevOps | Cloud

AWS and Grafana Labs are working together on cloud native observability

Cloud native observability is at a watershed moment. The explosion of microservices has created previously unseen amounts of monitoring data, limiting the ability of humans and computer systems to extract meaning from data with last-generation tools. Debugging is often a process of detecting correlation, and then turning correlations into causal connections. This is where modern cloud native tooling comes in.

How Grafana is helping the DIFFERENCE Foundation visualize medical data in their fight against a global pandemic

The DIFFERENCE Foundation is a non-profit organization based in the Netherlands that is focused on designing, guiding, and funding highly quantitative research on metabolic dysfunction. The foundation considers metabolic dysfunction—which can trigger obesity, diabetes, many other diseases—the original global pandemic, as it affects almost one quarter of the global population.

The 7 cultural values that drive Grafana Labs

At Grafana Labs, we believe that our culture is one of the differentiators that make us an extraordinary company. In the middle of a pandemic, we’ve been in the very fortunate position to be in a high-growth phase: We’ve expanded the team by 118% this year, and our headcount just tipped over 250 global Grafanistas.

How to create fast queries with Loki's LogQL to filter terabytes of logs in seconds

LogQL, the Loki query language, is heavily inspired by Prometheus PromQL. However, when it comes to filtering logs and finding the needle in the haystack, the query language is very specific to Loki. In this article we’ll give you all the tips to create fast filter queries that can filter terabytes of data in seconds. In Loki there are three types of filters that you can use.

How to find traces in Tempo with Elasticsearch and Grafana

Grafana Tempo, the recently announced distributed tracing backend, relies on integrations with other data sources for trace discovery. Tempo’s job is to store massive amounts of traces, place them in object storage, and retrieve them by ID. Logs and other data sources allow users to quickly and more powerfully jump directly to traces than ever before. Previously we investigated discovering traces with Loki and exemplars.

What does the future hold for Site Reliability Engineering?

Site Reliability Engineering, or SRE for short, has become quite the buzzword. I wasn’t there in 2004, when Ben Treynor started it at Google, but I claim bragging rights based on the fact that the very same Ben Treynor interviewed me for an SRE role in 2005. (I also got the job after the interview, in case that wasn’t obvious…) When SREcon EMEA 2019 came along, I thought it was just about time to publicly speculate about the future of our profession.

How we eliminated service outages from 'certificate expired' by setting up alerts with Grafana and Prometheus

Here at Grafana Labs we are lucky to work with many partners around the globe. From these partnerships, we get great inspiration into some clever use cases on how Grafana and Prometheus can be used to great effect for service monitoring and availability. We came across this use case that our partner OpenAdvice came up with for their client base, and we thought it was too good to keep secret!

How I started contributing to the Grafana open source project

My name is Karine. I’m a Software Engineer working with a team that provides monitoring solutions to our clients. A good part of my daily work is creating dashboards in Grafana. Since I started working with this tool, I have been so impressed by the quality and ease of use. I became even more impressed when I discovered it was an open source tool.

Best practices for meta-monitoring the Grafana Cloud Agent

Earlier this year, we introduced the Grafana Cloud Agent, a subset of Prometheus built for hosted metrics that runs lean on memory and uses the same service discovery, relabeling, WAL, and remote_write code found in Prometheus. Thanks to trimming down to the parts only needed for interaction with Cortex, tests of our first release have seen up to a 40% memory-usage reduction compared to an equivalent Prometheus process.

Tracing with the Grafana Cloud Agent and Grafana Tempo

Back in March, we introduced the Grafana Cloud Agent, a subset of Prometheus built for hosted metrics. It uses a lot of the same battle-tested code as Prometheus and can save 40 percent on memory usage. Ever since the launch, we’ve been adding features to the Agent. Now, there’s a clustering mechanism, additional Prometheus exporters, and support for Loki. Our latest feature: Grafana Tempo! It’s an easy-to-operate, high-scale, and cost-effective distributed tracing system.