Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Getting Started with Grafana Cloud's AI Assistant for Observability

The pace of software delivery in 2025 is unprecedented — cloud-native apps, microservices, and AI-generated code are shipping in days, not months. But one challenge never changes: ensuring reliability and visibility when systems fail. In this video, we explore how the new Grafana AI Assistant brings true, context-aware observability to your stack. Watch as we deploy an open-source Python service with Kafka, Postgres, Kubernetes, and Prometheus then use the AI assistant to instantly generate dashboards, alerts, and reduce un-needed telemetry volume.

Real-Time Status Monitoring for 50+ EdTech Tools K12 IT Teams Actually Use

K12 IT departments face a unique challenge: keeping dozens of educational technology platforms running smoothly while teachers conduct lessons and students complete assignments. A single service outage can disrupt hundreds of classrooms simultaneously. That's why implementing a k12 service status dashboard has become essential for school technology teams managing complex digital learning environments.

Introducing the Coralogix Transactions processor

Coralogix Transactions are a trace segmentation strategy, unique to the Coralogix platform. They allow users to analyze the performance, over time, of a collection of related spans, across billions of traces. Coralogix has introduced a transactions processor into the OpenTelemetry contrib image, enabling users to activate this unique feature using nothing more than OpenTelemetry configuration.

What Is an MCP Server?

Ok MCP server, If you’ve been following AI development lately, you’ve probably heard whispers about “MCP Servers” floating around developer circles. It’s been around a little while now, and I myself have finally gotten round to using it. Boy, do we need to talk about it. MCP (Model Context Protocol) is Anthropic’s open standard that lets AI assistants connect directly to your tools and data sources, not just static documentation or code snippets.

Inside the Coralogix AI Center: Solving AI's Silent Failure Crisis

Observability has always answered one core question: Is it running? But in the era of LLMs, autonomous agents, and AI-powered workflows, that’s no longer enough. We need to ask a harder, scarier question: Is it right? And right now, most teams can’t answer that. Let’s fix it. In our last post, “The AI Monitoring Crisis No One’s Talking About,” we outlined why prompt injection, hallucinations, and context drift create invisible failures.

The IT story behind 911 emergency services

At 2:37am on a cold Oregon night, a fire alarm blared at a rural station. Seconds later, the call came in: a structure fire on the outskirts of Rogue Valley. But what if that alarm never reached the station? This isn't a hypothetical. For the IT team at Emergency Communications of Southern Oregon (ECSO 911), it’s the kind of emergency scenario they prepare for every day.

How to Monitor NVIDIA GPU Metrics with Cribl Edge & Stream (Complete Tutorial)

If you’re running AI, ML, or data-intensive workloads on GPUs, monitoring their performance is critical. Overheating, under-utilization, or memory bottlenecks can cost you thousands in cloud bills and potential downtime. This guide walks you through collecting real-time GPU telemetry using nvidia-smi, sending it to Cribl Edge, routing it through Cribl Stream, and using Cribl Search to analyze the data—step by step.

How ELSER Transforms One Keyword into Better Search Results

In this session, we’ll show you how Elastic's ELSER takes a single token like _“Terminator”_ and expands it into semantically related terms such as _software, alien, computer technology,_ and _Connor_ (for John Connor). This makes search results more relevant, even when the exact keyword isn’t used.