Operations | Monitoring | ITSM | DevOps | Cloud

Instrument zerocode observability for LLMs and agents on Kubernetes

Building AI services with large language models and agentic frameworks often means running complex microservices on Kubernetes. Observability is vital, but instrumenting every pod in a distributed system can quickly become a maintenance nightmare. OpenLIT Operator solves this problem by automatically injecting OpenTelemetry instrumentation into your AI workloads—no code changes or image rebuilds required.

What Engineers Want from AI in Observability... According to the 2026 Observability Survey Report

The results show strong interest in AI for forecasting, root cause analysis, onboarding, and generating dashboards, alerts, and queries. But when it comes to autonomous action, practitioners are more cautious — and 95% say AI needs to show its work to earn trust.

Real-Time Data: The Engine of Efficient, Sustainable Data Centers

Imagine knowing every detail of your data center as it happens. Real-time data makes this possible. You can monitor systems, track performance, and adjust resources on the fly. This proactive approach leads to smoother operations and reduced downtime. By constantly having up-to-date information, you can maintain peak efficiency in your facility. Such insights allow you to optimize cooling and power use, which are crucial to keeping costs down.

AI in observability in 2026: Huge potential, lingering concerns

The role of AI in observability is evolving rapidly, but the data from our fourth annual Observability Survey makes one thing abundantly clear: the potential is real, and so are the reservations. Practitioners overwhelmingly see value in using AI to help surface anomalies, forecast and spot trends, assist with root cause analysis, and get new users up to speed quicker.

Open standards in 2026: The backbone of modern observability

Open source software and open standards are now an essential part of how organizations maintain their systems. That's not to say they haven't always been important, but the fourth annual Observability Survey, brought to you by Grafana Labs, shows just how deeply the shift to open has taken hold, with 77% of respondents saying open source and open standards are important1 to their observability strategy.

Engineers Want AI in Observability - With One Catch: 4th Annual Observability Survey by Grafana Labs

Actually useful AI is welcome in observability. AI for the sake of AI is not. In this overview of Grafana Labs’ 4th annual Observability Survey, Marc Chipouras shares what 1,300+ respondents from 76 countries told us about the current state of observability — and what comes next. This year’s survey explores four major themes: The results show strong interest in AI for forecasting, root cause analysis, onboarding, and generating dashboards, alerts, and queries. But when it comes to autonomous action, practitioners are more cautious — and 95% say AI needs to show its work to earn trust.

Bridge the DevSec divide: Using Grafana Cloud and Miggo for runtime protection

Note: This blog post is co-authored by Daniel Shechter, CEO and co-founder of Miggo Security. Modern runtime security is critical to understand complex systems and detect and protect against attacks, especially in rapidly evolving cloud native architectures. For many security teams, however, achieving deep visibility into runtime risks remains a moving target.

Quickly go from exploration to action with new one-click integrations in Grafana Drilldown

The Grafana Drilldown apps gives you a queryless, point-and-click way to explore your metrics, logs, traces, and profiles. But finding an insight is only half the job—you still need to act on it. Previously, that meant leaving Drilldown, manually copying queries, and navigating through Grafana's dashboards, Alerting, and "Explore" interfaces to pick up where you left off.

From signals to savings: Optimizing cloud costs with Grafana Assistant and MCP servers

In today's cloud-native environments, managing resource waste and optimizing costs can feel like a constant battle. Operators, along with their fearless FinOps teams, spend countless hours hunting down unused resources, deciphering complex telemetry data, and manually implementing code or configuration changes to try to reduce cloud costs. But what if you could automate the entire process, from identifying waste to implementing the fix, all based on actual production telemetry?