Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Surface and remediate runtime posture issues with Workload Protection Findings

Threat detection and runtime posture monitoring are related but different jobs. Security teams already rely on Datadog Workload Protection to detect threats in real time across hosts and containers. But the actions that lead to those detections (file manipulation, process execution, network calls, or kernel activity) can be indicative of compromise or simply of risky behavior—like running compilers in production containers.

Alert Noise Isn't an Accident - It's a Design Decision

In a previous post, The Incident Checklist: Reducing Cognitive Load When It Matters Most, we explored how incidents stop being purely technical problems and become human ones. These are moments where decision-making under pressure and cognitive load matter more than perfect root cause analysis. When systems don’t support people clearly in those moments, teams compensate. They add process. They add people. They add noise. Alerting is one of the most visible places where this shows up.

The Grok-to-AI Evolution: Why Modern SREs Are Moving Beyond Manual Parsing

Grok structures logs. Context engineering connects systems. AI explains behavior. For years, Grok patterns have been the workhorse of the SRE world. Built on regular expressions, Grok helps teams extract structure from unstructured logs. As we explored in "Do You Grok It?", Grok is the key to turning messy log lines into usable fields. It's why our Grok Pattern Reference remains one of our most-visited resources — SREs are hungry for structure.

ISO 27K Without the Bloat: An Open Source Approach

It’s often framed as an enterprise-only exercise: long timelines, expensive tooling, consultants everywhere, and a lot of compliance work that exists mainly to survive an audit. As a ~40-person, engineering-driven SaaS company, we needed the same level of trust and rigor as much larger organizations — but we weren’t willing to accept shelfware, parallel compliance infrastructure, or controls that only exist on paper. We also didn’t stop at ISO 27001.

Observability trends for 2026 (Part 2): GenAI and OpenTelemetry reshape the landscape

Over the course of my 20 years as a developer, SRE, and now observability product leader, software has typically progressed at a good pace. But now, the emergence of two transformative technologies are fundamentally reshaping enterprise observability: generative AI (GenAI) and OpenTelemetry (OTel). We surveyed over 500 IT decision-makers for a new report:The Landscape of Observability in 2026: Balancing Cost and Innovation.

How to Enhance Service Management for Small Firms

Small firms juggle many tasks at once. They serve clients while managing budgets and staff. Most owners spend their days putting out fires instead of building better systems. Poor service management drains resources fast. Client requests get lost in email threads. Team members use different tools for the same tasks. Bills slip through the cracks. These problems cost money that small businesses can't afford to lose.