Operations | Monitoring | ITSM | DevOps | Cloud

Burnout Doesn't Ask Permission: Recognizing, Recovering, and Rebuilding w/ Stephen Townsend

Burnout doesn't announce itself. For Stephen Townsend, SRE team lead and host of the Slight Reliability podcast, it crept in over months of mounting pressure on a massive transformation program, and announced itself overnight with an inability to sleep. In this episode, Stephen shares his personal burnout story with rare honesty: the physical symptoms he dismissed, the org structure that left him without autonomy, and the full year it took to recover.

Hot Takes: What the AI Hype Gets Wrong About Software Engineering Excellence | Harness Blog

Ahead of the DevOps Modernization Summit, Matthew Skelton, CEO & CTO of Conflux shares his takes on output-driven AI, how DORA metrics aren’t enough, and why governance and compliance must be built into the platform. ‍ Matthew Skelton is the CEO & CTO of Conflux and a featured speaker at this year’s DevOps Modernization Summit. Ahead of our annual summit, Matthew has shared his hot takes on AI, DORA, and the key to successful automation.

How to Build AI-Native Security Resilience (And Finally Get Developers And Security On The Same Team) | Harness Blog

Developers and security professionals have struggled to get on the same page for what seems like forever and AI is only making that divide larger, according to results from our State of AI-Native Application Security 2025 research report.

Public Sector Observability: Service Experience and Reliability Are Now Mission-Critical

Reliable digital services aren’t optional for public sector agencies. They’re essential to mission success. Across the U.S. public sector, service experience and reliability have moved from operational concerns to mission requirements. At a federal level, Executive Order 14058 makes improving service delivery and customer experience a federal priority, measured by real outcomes for the public. And for state and local governments, the bar is set by the private sector.

The post-mortem problem

Post-mortems are one of the most consistently underperforming rituals in software engineering. Most teams do them. Most teams know theirs aren't working. And most teams reach for the same diagnosis: the templates are too long, nobody has time, and nobody reads them anyway. These aren't wrong observations. But they're symptoms, not causes. The actual problem is that somewhere along the way, the post-mortem stopped being a piece of communication and became a compliance artifact.

We Turned Our WireShark Wizard Into a Markdown File

Rocky AI — Checkly’s AI agent — is now Generally Available. We developed Rocky AI over the last ~6 to 8 months. This is an aeon in AI-years. During this period, we learned a ton. About AI, but mostly about how to fit them into an existing SaaS product, not just another chat widget. This is my ramble…

Introducing Rocky AI to General Availability

After months of being available in Beta for our app users, Rocky AI is now generally available to all users and plans. Rocky AI is Checkly’s AI agent that works around the clock, 24/7, to make sure your application’s reliability is optimal. In this first release, Rocky AI ships with the ability to run continual Analysis on test and check failures, giving your teams AI-powered root cause analysis, impact analysis, and more.

Beyond "Reactive" Accessibility: Meeting the 2026 ADA Title II Mandate in Higher Ed

For decades, digital accessibility in state-funded higher education has largely been a "reactive" game. If a student with a visual impairment reported an issue with a tuition portal, the university would scramble to provide an accommodation. As long as the institution could show "meaningful progress" toward compliance, it was generally shielded from significant legal repercussions. That era is officially ending. The U.S.