Operations | Monitoring | ITSM | DevOps | Cloud

Grafana Play updates: A redesigned homepage to celebrate our community

Grafana Play is a free, publicly accessible sandbox environment where anyone can explore and learn about Grafana, no setup or sign-in required. It comes preloaded with sample dashboards demonstrating how to connect to data sources, build visualizations, and experiment with Grafana’s advanced features. Hosted on Grafana Cloud, Grafana Play has grown significantly over the years. With thousands of public dashboards, it’s now a go-to destination for Grafana learning and exploration.

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

What Is a Data Pipeline

In today’s tech world, IT and security technologies are the functional equivalent of Pokemon. To gain the insights you need, you “gotta catch ‘em all” by ingesting, correlating, and analyzing as much security data as possible. Data pipelines organize chaotic information flows into structured streams, ensuring that data is reliable, processed, and ready for use.

Agentic AI and the End of Traditional IT (w/ Robb Wilson)

In a wide-ranging conversation, Robb Wilson—CEO and co-founder of OneReach.ai and author of The Age of Invisible Machines—joins Tim and Tom to explore the rise of agentic AI and its seismic implications for IT, organizations, and society. Robb breaks down the concept of agent runtimes, why conversational interfaces matter more than ever, and how adaptive, self-orchestrating systems will reshape work far beyond today’s service models.

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

Urgent Security Alert: New Firewall Vulnerabilities Identified. #patch #shorts

Cisco has identified a new attack variant targeting firewalls, highlighting the need for user education on vulnerabilities. A vulnerability discovered on November 5th affects nearly 50,000 devices, with many still at risk as of September 30th. Organizations struggle to expedite updates due to complex configurations and implementation challenges. Recent research indicates that network edge devices are prime targets for zero-day vulnerabilities, emphasizing the need for improved security measures.

Automating Chaos Engineering with Terraform

Automating chaos engineering with Terraform eliminates manual setup across environments by enabling you to version control your entire chaos infrastructure, from service discovery to security governance policies. The Harness Terraform provider supports end-to-end automation including Kubernetes infrastructure setup, custom image registries, Git-based ChaosHub management, and granular security controls that ensure safe experiment execution in production.

Pitch Deck Services Are the Secret Ops Tool Nobody Talks About

Operations is about systems that work quietly. Funding is about stories that get loud. Most teams treat those things as opposites. But in reality, they run on the same principle: clarity. A company without clarity doesn't scale. It just spins. That's why the smartest teams have started treating their pitch decks not as vanity projects, but as operational tools. The same way they treat dashboards or quarterly reviews.