Operations | Monitoring | ITSM | DevOps | Cloud

Logs told me something broke. Traffic showed me what.

Here’s a problem I run into constantly: something breaks in production, I can see the 500 errors in my logs, but I can’t reproduce it locally. The trace shows me the dependency graph but not the actual request that failed. This is especially painful in microservices. I was looking at a CNCF example the other day (a simple demo app, like 4 pods) and it already had so many cross-service dependencies that understanding what broke required looking at the whole system at once.

IBM Think 2026 Infrastructure Insights for IT Leaders

IBM Think 2026 made one thing clear: infrastructure leaders are being asked to support more AI, more automation, and faster decision-making without adding unnecessary complexity or risk. Held earlier this month in Boston, IBM Think 2026 focused heavily on enterprise AI, hybrid cloud, automation, governance, and operational transformation.

Agent governance starts with the service catalog you already run

Last month, an AI agent running inside Cursor wiped PocketOS's entire production database, including its backups, in roughly nine seconds. The agent found an API token in an unrelated file, originally created for managing custom domains, and used that token to execute the deletion. The backups sat inside the same blast radius as the database the agent was operating against. Nine months earlier, a Replit AI agent had done the same thing to a SaaStr database during a designated code freeze.

Atlassian Transforms Product Development with AI

What used to take months now takes weeks, and it’s changing what it means to build great products. At Atlassian, product managers and designers are using Rovo and Jira Product Discovery to move faster at every stage of the development lifecycle. From running deep research across all their tools and documents, to capturing ideas, surfacing insights, and prioritizing what to build next. AI is transforming how product decisions get made.

DataPrime at ingest (DPXL): See the impact of any routing decision

TCO policies have always been one of the most impactful cost levers in Coralogix. Route business-critical data to High, push monitoring data to Medium, archive compliance logs to Low. With the addition of DataPrime expressions (DPXL) – a subset of the DataPrime query language designed for inline filtering at ingest – that routing became even more precise, matching on any field in the event payload, not just application, subsystem, and severity.

Federated Search | From Silos to Insight | Azure Blob Schema Discovery with Splunk's Crawler

This walk-through shows how Splunk's Cloud can discover schema and partition keys for Microsoft Azure Blob Storage datasets and create searchable Splunk managed tables. Once the data is mapped, analysts can use Splunk Federated Search to query Azure Blob data where it lives, bringing cloud-resident logs into security, observability, and operational work-flows without re-ingesting the data.

The Observability Journey: Getty Images and Cribl

I recently sat down with Simon Overbey and Lovepreet Singh - the Engineering Manager and systems engineer (respectively) at Getty Images to talk about their experiences implementing Cribl. After getting a rundown of the pre-Cribl environment (described above) I asked to jump straight to the end, the net benefits. If the "before" was a terrifying tidal wave of cost and complexity, what did the "after" look like?

What's new in Calico: Spring 2026 Release

Kubernetes has come a long way since its debut in 2014. It’s gone from running a couple of containerized microservices to orchestrating fleets of production workloads spanning everything from AI agents to full scale VMs running in pods. As Kubernetes adoption grows, and its use cases stretch to cover more ground, managing its increasingly complex networking and security landscape demands operational maturity and a platform that supports it.