Operations | Monitoring | ITSM | DevOps | Cloud

The New Rootly Ringtones: How Research-based On-Call Sounds

We set out to create a ringtone that wasn’t just loud—but the sound of a modern pager. Something that wakes you up, but without triggering a full-blown adrenaline spike. In this video, go behind the scenes with sound engineer Gorjão as he crafts a how research-based on-call sound sounds like.

Metrics That Matter: Measuring Developer Productivity in the AI Era

In this episode, Ryan McDonald is joined by Mark Quigley, Head of Platform Engineering at Ninety.io, for a conversation that cuts through the noise around developer productivity metrics and AI. Mark dives deep into how teams can measure what matters—without falling into the trap of turning every measure into a target. He shares how tools like Developer NPS, DORA metrics, and balanced scorecards can help teams optimize for both output and well-being—but only when framed with the right intent.

Are AI and Platforms Making SRE Obsolete? With Kaspar von Grünberg, Humanitec's CEO

Last year, over 89% of companies claimed to have adopted platform engineering. And, in the past month, LLMs have been disrupting how we think about software development. In this context, Kaspar, asks if the role of Site Reliability Engineers is being obsolete as we know it. Kaspar argues that while SREs aren’t going anywhere, their responsibilities are evolving—fast. We talk about.

How Motive achieves 99.99% reliability with Rootly

In the high-stakes world of fleet management, reliability isn’t a nice-to-have—it’s a necessity. That’s why Motive has invested heavily in tools and processes to ensure its systems run smoothly for over 150,000 customers and more than a million vehicles. At the center of its ability to deliver 99.99% uptime at scale is Rootly.

Scientific Incident Management with Dan Slimmon

Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response. In this episode, Dan shares his philosophy on "scientific incident response," the importance of hypothesis-driven troubleshooting, and why incidents should be seen as normal in complex systems.

How AI broke serverless and what to do about it with Vercel's Mariano Fernández Cocirio

Mariano, Staff Product Manager at Vercel, explains why serverless architectures are hitting unexpected limits—they’re too fast. The industry has spent millions optimizing serverless for speed, but AI workloads are changing the game. In the AI realm, slower execution often leads to better results. The challenge? Paying for all that idle compute time while waiting for AI responses.

I Want My Shoes Fast! Observability, SRE Burnout, and OTel with Dynatrace's Adriana Villela

In this episode, we sit down with Adriana Villela, Principal DevRel at Dynatrace and OpenTelemetry contributor to break down how observability impacts reliability. We dive into what contributes to SRE burnout and how managers can create psychologically safer spaces for responders. Adriana also shares her perspective on AI as an observability-buddy to navigate incidents.

AI in Production with GitHub's Sean Goedecke

In this episode, we sit down with Sean Goedecke, Staff Software Engineer at GitHub, to discuss where LLMs fit into real-world development. Sean shares how he’s using LLMs how he’s drawing the line for AI-assistance in the codebases he manages—though, as he says, this might all change by next summer. Sean also weighs in on how LLMs could assist SREs during outages—especially when you’re only half-awake at 3 a.m. after a rather inconvinient page.

The Domino Effect of Outages with Nuno Tomás, Founder of isDown.app

Humans of Reliability: Keeping systems up and the lights on isn’t just about technology—it’s about the people behind it. In this episode, we’re thrilled to chat with Nuno Tomas, founder of Isdown.app, a vendor outage monitoring tool transforming how teams handle third-party incidents. Nuno shares his journey from software engineer to entrepreneur, the pivotal 4 a.m. moment that inspired Isdown, and the challenges of balancing startup life with family. We dive into the complexities of incident communication, how to tackle alert fatigue, and why transparency is key to building trust in SaaS.