Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Optimizing the OpenTelemetry Python SDK for LLM Workloads

Agentic workloads thrive with precision tooling. Just like developers, they need the rich context, high cardinality, and fast feedback loops that allow them to ask exploratory open-ended questions of their code. But instrumentation is costly, and from the dawn of software, developers have tried to do the most possible with the least amount of resources.

Putting FinOps theory into practice with SquaredUp

The public cloud has revolutionized IT by making infrastructure on-demand, scalable, and self-service. However, this convenience comes at a price. In the cloud, engineers can instantly spin up resources and spend company money with the click of a button or a line of code, bypassing traditional procurement and finance approval processes.

How to manage synthetic monitoring checks as code with Terraform and Grafana Cloud

As teams scale, managing synthetic monitoring checks manually in the UI becomes difficult and error-prone. When you're dealing with dozens of checks across multiple environments, teams experience inconsistent configurations, lack of version control, and difficulty tracking changes.

Kubernetes Monitoring Helm chart v4: Biggest update ever!

The Kubernetes Monitoring Helm chart is the easiest way to send metrics, logs, traces, and profiles from your Kubernetes clusters to Grafana Cloud (or a self-hosted Grafana stack). And version 4.0 is the biggest update the chart has ever received. Representing nearly six months of planning and development, it's designed to solve real pain points that users have hit as their monitoring setups have grown.

A faster way to pinpoint performance bottlenecks: Using Profiles Drilldown with Grafana Cloud Knowledge Graph

When you identify CPU or memory spikes in your services, it’s critical to understand why they’re happening. But switching between tools or crafting complex queries can slow you down when trying to pinpoint a root cause. This is why we’re excited to share that Profiles Drilldown, an application that lets you easily explore profiling data through an intuitive, point-and-click interface (no queries required), is now integrated with Grafana Cloud Knowledge Graph.

From Stack Trace to Probable Cause: AI Root Cause Analysis Is Here

You know the drill. An error fires, you get the stack trace, and then you spend the next 45 minutes tracing it backward through four services, two config files, and a deploy that happened three hours ago. You eventually find the root cause, but the path to get there was manual, slow, and entirely dependent on how well you already knew the codebase. We built AI-powered root cause analysis (RCA) for that kind of slog.

How to Monitor a Shopify Store with Playwright and Checkly

This is a guest post by Vince Graics, Staff QA Engineer at World of Books. If you're running a Shopify storefront and want reliable synthetic monitoring, you'll hit a wall. Shopify's bot detection doesn't care that your headless browser is friendly; it sees datacenter IPs and acts accordingly. Cart API calls get hit with 429 rate limits, Cloudflare challenge pages pop up mid-check, and you're left wondering whether the bug is in your code or in the platform fighting you.

JSON Jiu Jitsu: Has JSON Parsing Got You in a Chokehold?

From malformed fields to endlessly nested objects, JSON logs can feel like they’re trying to submit your SIEM. In this technical session, we’ll demonstrate how to turn that chokehold into a clean takedown using Graylog’s parsing, normalization, and enrichment capabilities. You’ll learn how to: Whether you’re a SOC analyst tired of regex wrestling or an admin looking to streamline onboarding, you’ll leave with practical techniques to make messy JSON your sparring partner—not your opponent.