Latest Blogs

AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Feb 11, 2026 By Nuno Tomas In isDown

At approximately 9:15 PM UTC on February 10, 2026, Amazon CloudFront began returning NXDOMAIN responses for DNS queries against specific distributions. In practical terms: DNS was telling users that services behind those distributions simply didn't exist. The root cause was a DNS resolution failure within CloudFront's infrastructure that quickly spread to eight interconnected AWS services.

Read Post

isDown

Read more about AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Silent Failures: Why AI Code Breaks in Production

Feb 11, 2026 By Ken Ahrens In Speedscale

You ship a small “safe” change on Friday. The diff is tiny, the tests are green, and the AI assistant was confident. An hour after deploy, your on-call channel lights up. A downstream service is rejecting responses that look fine in code review. Now you’re rolling back and rewriting a fix that should have been obvious if you had real traffic in the loop. This isn’t a hypothetical.

Read Post

Speedscale

Read more about Silent Failures: Why AI Code Breaks in Production

A new perspective on dashboard sprawl

Feb 11, 2026 By Dave Clarke In Squared Up

Dashboards are supposed to answer questions, not create more of them. But investigations don't stop at a single view. The moment you want to understand one specific thing in detail like a failing VM, a degraded service, a slow pipeline, dashboards start to break down. You end up either building yet another dashboard or searching through many different ones. SquaredUp's Perspectives changes this.

Read Post

Squared Up

Read more about A new perspective on dashboard sprawl

A Notification List Is Not a Team

Feb 11, 2026 By James Barnes In StatusCake

In the previous post, we looked at how alert noise is rarely accidental. It’s usually the result of sensible decisions layered over time, until responsibility becomes diffuse and response slows. One of the most persistent assumptions behind this pattern is simple. If enough people are notified, someone will take responsibility. After more than fourteen years of working with engineering teams of every size and shape, we’ve seen this assumption fail repeatedly.

Read Post

StatusCake

Read more about A Notification List Is Not a Team

Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

Feb 11, 2026 By Charity Majors In Honeycomb

Christine and I started Honeycomb in 2016, which means it’s been ten years. Christine, a developer, and I, an operations engineer, were both profoundly unhappy with the state of the art in monitoring and logging tools. The tools we had used at Facebook didn’t spray our signals around to a bunch of siloed-off pillars. They consolidated as much context as possible so we could properly explore it, the way every other non-software engineering team already takes for granted.

Read Post

Honeycomb

Read more about Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

88% of Organizations Face Growing IT Complexity. Here's How Leaders Are Responding

Feb 11, 2026 By Renuka Suresh In HEAL Software

CIOs, IT leaders, platform engineering managers, and SRE/DevOps teams running multi-tool monitoring stacks who need faster incident clarity.

Read Post

HEAL Software

Read more about 88% of Organizations Face Growing IT Complexity. Here's How Leaders Are Responding

Agent vs Assistant: The key distinction between Olly and the competition

Feb 11, 2026 By Chris Cooney In Coralogix

The market is saturated with agents and assistants, making it difficult to tell them apart. However, the difference between these two approaches is significant. They offer radically distinct levels of impact, reflecting major differences in both their technical complexity and the quality of their inferences. Let’s figure out the distinction.

Read Post

Coralogix

Read more about Agent vs Assistant: The key distinction between Olly and the competition

OpenTelemetry in Production: Design for Order, High Signal, Low Noise, and Survival

Feb 11, 2026 By Sematext In Sematext

A lot of talk around OpenTelemetry has to do with instrumentation, especially auto-instrumentation, about OTel being vendor neutral, being open and a defacto standard. But how you use the final output of OTel is what makes business difference. In other words, how do you use it to make your life as an SRE/DevOps/biz person easier? How do you have to set things up to truly solve production issues faster?

Read Post

Sematext

Read more about OpenTelemetry in Production: Design for Order, High Signal, Low Noise, and Survival

Why Monitoring Matters for Modern Hosting Platforms

Feb 11, 2026 By Connor James In AppSignal

With all the discussion in the dev community lately about changes made at Heroku, we wanted to use this moment to talk about PaaS (Platform as a Service) providers and how AppSignal can be a vital tool to ensure you're using your app's hosts for everything from optimal performance to lower usage bills.

Read Post

AppSignal

Read more about Why Monitoring Matters for Modern Hosting Platforms

Operations | Monitoring | ITSM | DevOps | Cloud

AWS CloudFront Outage (Feb 2026): Timeline, Cascade, and Lessons

Silent Failures: Why AI Code Breaks in Production

Top Kubernetes interview questions of 2026: A beginners guide

A new perspective on dashboard sprawl

A Notification List Is Not a Team

Happy Birthday to Us: Honeycomb 10 Year Manifesto, Part 1

88% of Organizations Face Growing IT Complexity. Here's How Leaders Are Responding

Agent vs Assistant: The key distinction between Olly and the competition

OpenTelemetry in Production: Design for Order, High Signal, Low Noise, and Survival

Why Monitoring Matters for Modern Hosting Platforms

Monthly Archive

Follow Us