Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Trace without traces

A customer emailed on a Tuesday: checkout hung for ten seconds. I opened our tracing tool, punched in the time window, and got nothing. The trace was sampled out. We keep 1% of traces, like most shops with real traffic do. The one request that actually mattered was in the 99% we threw away. I spent twenty minutes admiring our observability stack before admitting it couldn’t answer a first-grader’s question: what happened to this person? Here’s what I know now.

To learn and improve, we cannot be afraid to fail

“Deployment stress doesn’t just come from high-profile public outages. It often starts much earlier, when a fear of failure seeps into team culture.” Rob Richardson, Software Craftsman Rob certainly knows the stress and embarrassment of public deployment failures. "But overall" he reflects, "I’ve had more stress in my career from internal failures.

Cloud repatriation strategies: From public dependency to hybrid flexibility

The phrase "cloud first" dominated IT strategy for the better part of a decade. It was gospel, practically unchallengeable, and for a lot of organizations, it was the right call. But something shifted between 2024 and 2026, and it shifted fast. Bills stopped being defensible. Vendor pricing imploded. Sovereignty stopped being a compliance checkbox and became a procurement requirement.

OpenAI API cost calculator: estimate your GPT spend before it estimates you

This OpenAI API cost calculator (also an AI inference calculator for o3/o4-mini thinking tokens) estimates your monthly OpenAI API pricing bill from three inputs: model, request volume, and average tokens per request. Toggle between standard, batch, and cached pricing and get your number in seconds. It also shows what the same workload costs on Claude and Gemini. For the full per-model rate card, see CloudZero's OpenAI API pricing guide.

Shipped: The Fastly spend that was hiding in plain sight

CDN and edge spend is easy to lose track of. Fastly bills on its own, off to the side of your cloud invoice – real money, often significant, sitting where none of your cost tooling reaches. So it stays its own island: a lump sum with no easy way to tie it back to the teams, products, and customers driving the traffic.

AI Summary Agent in Turbo360

Handed over an Azure integration environment you've never seen before? Turbo360's AI Resource Summary agent gives any support operator or engineer an instant plain-English overview of what a resource is, how it behaves, and what to watch out for - without needing to ask the developers. In this demo: Great for: IT operations teams, MSP NOCs, cloud support engineers, and anyone responsible for running integration workloads they didn't build.

Prepare for the EU AI Act with Harness AI Security | Harness Blog

Harness AI Security provides a unified control plane for AI discovery, risk visibility, and runtime protection, helping organizations operationalize key requirements of the EU AI Act. Instead of relying on manual audits or fragmented tooling, teams get continuous insight into how AI systems are built, exposed, and used, along with the evidence needed to demonstrate compliance.

You Can't Detect What You Never Collect: Telemetry Coverage in the Agentic SOC

Every detection rule, every threat hunt, every AI agent you deploy rests on one silent assumption: that the data describing an attack actually reached your tools. When it doesn’t, nothing above it can save you, and no one gets an alert that the data was missing. Security teams invest heavily in the sharp end of the stack: detection content, threat intelligence, response playbooks, and increasingly, AI agents to triage and investigate at machine speed.

Why compliance audits keep slowing your engineering team down

If you've shipped software in fintech, healthcare, or government, you probably know the specific dread of an upcoming compliance audit. Not because the software isn't secure, but because proving it is requires reconstructing a paper trail for decisions that were made in Jira tickets, Slack threads, and pull request comments over the last six months. The software is fine. The documentation of the software is the problem.