Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Why High-Cardinality Metrics Break Everything

High-cardinality metrics are one of those ideas that sound obviously right - until you try to use them in production. In theory, they promise precision. Instead of averages and rollups, you get specificity: per-request, per-userid, per-container, per-feature insights. The kind of detail we all immediately want when something is on fire. And then things start breaking. Not immediately. Not loudly.But quietly.

AI-generated media: What's the point?

If you have even a minor social media presence, you've probably been unfortunate enough to come upon the wonderfully disturbing world of AI slop content. We're talking wrestling matches featuring controversial mustached historical figures and Formula One-style races featuring Stephen Hawking in his wheelchair (if you have no idea what I'm talking about, I genuinely envy you).

Episode 4 - 2025 AI Retrospective and What's Next for 2026

In this special holiday episode of The Intelligent Enterprise, host Tom Stoneman takes a step back from the day-to-day pace of enterprise life to look at where AI has been in 2025 and where it might be heading next. To do it, he sits down with his colleague VS Joshi, Global Head of Product Marketing at Digitate, for a year-end retrospective and a 2026 outlook.

Shorten your 'inner loop' as a new hire and get past imposter syndrome with Grafana Assistant

Let's talk about being new. Four months ago, I joined Grafana Labs as a senior solutions engineer. It wasn’t just a new company, it was a new industry. I came from the visual workspace provider Miro, where I was comfortable doing discovery and talking about visual collaboration and innovation. But stepping into observability? I was in the deep end. And let me tell you, the imposter syndrome was real. Everyone around me was fluent in this language of metrics, logs, and traces.

Zero code tracing: Kubernetes observability with Logz.io and eBPF

Distributed tracing is a core tool for operating modern microservices platforms. For SREs and DevOps teams, it is often the fastest way to understand latency issues, service dependencies, and unexpected failure modes. But achieving comprehensive tracing coverage is resource-intensive and time-consuming. It usually requires application changes, language-specific instrumentation, agent lifecycle management, and ongoing coordination with development teams.

Normalize any logs for Cloud SIEM with Datadog's OCSF processor

Security teams need visibility across every system they defend, including cloud platforms, SaaS applications, security controls, identity providers, and custom services. But those systems all produce logs in different formats, with inconsistent field names and structures. That lack of standardization makes it harder to correlate events, write reusable detections, and investigate incidents quickly.

5 Reasons Why Website Design is Now an Operational Concern

There was a time when website design lived entirely in the marketing department-all about how your brand looked, how long visitors stayed, and how credible you seemed. A beautiful site meant trust, and a bad one meant lost sales. Simple as that. But that version of "web design" doesn't exist anymore. With the rise of JavaScript-heavy frameworks, cloud infrastructure, and performance-driven SEO, design has become an operational concern.

GitHub Outage Tracker: 5 Real-Time Monitoring Methods

When GitHub goes down, everything stops. Your developers can't push code. CI/CD pipelines hang indefinitely. Pull requests pile up. Deployments freeze. And if you're like most engineering teams, you find out about it when your Slack channel explodes with "Is GitHub down for everyone?" The average GitHub outage could cost teams 2-4 hours of developer productivity. For a 50-person engineering org, that's 100-200 hours of lost work — assuming you catch the outage immediately. Most teams don't.