Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

An Accidental Shutdown - War Room Story from Ex-Roblox's SRE

Former Roblox Sr. Engineering Manager Denys Pashutynski shares a classic reliability horror story from 20 years ago in Ukraine - when one misplaced command shut down the entire corporate LDAP controller. From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Ex-Roblox SRE's take on SRE vs. DevOps

Former Roblox Sr. Engineering Manager Denys Pashutynski clarifies the fundamental difference between SRE and DevOps roles: SREs handle the customer-facing production edge while DevOps focuses on background automation.#sre From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Balancing Technical Debt in Fast-Growing Teams

Sometimes messy code is better than perfect code. Hear from Ramiro Berrelleza on why over-cleaning technical debt can paralyze your startup's growth, and when it's okay to move fast and fix later. From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Ex-Google SRE's 3-Minute On-Call Response

Ever wondered about the most intense on-call requirements? Ex-Google SRE Niall Murphy reveals the Google traffic team's strict 3-minute SLA and $2,500/second stakes in the ads system.#SRE#Observability From The Incidentally Reliable podcast - real stories from the trenches of site reliability engineering. Made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

The biggest mistake by Devtool founders

Key advice from Ramiro (CEO & Founder Okteto): Don't get attached to your solution - get attached to the problem you're solving! Watch how this mindset helped build a successful Kubernetes developer experience tool.#StartupAdvice#Observability Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

The Hard Truth About the Observability Landscape

Why are ex-FAANG engineers building observability companies? When millions depend on reliable software, a simple reboot isn't enough anymore. From The Incidentally Reliable podcast with Piyush Verma discussing modern software reliability.#Observability Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Think Fast: When SREs saved the customer experience

How quick decision-making saved customer experience! Featuring Piyush Verma (CTO Last9). Exclusively on The Incidentally Reliable podcast, which is made by SREs for SREs and hosted by Zenduty. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.

Turn Chaos into Clarity with Zenduty | AI-Powered Incident Management Tool

Every minute of downtime costs your business customers, revenue, and trust. Can you afford to let incidents spiral out of control? With Zenduty, you don't have to. Our AI-powered incident management platform empowers your team to: Minimize MTTR and resolve incidents faster. Reduce alert fatigue and stay focused. Scale your incident response processes with ease. Turn chaos into clarity and keep your systems running smoothly.

GoDaddy's Journey to Hosting Reliability - Incidentally Reliable Podcast with Amit Rindhe

What does it take to keep over 82 million domains running seamlessly? How do you plan for disasters while maintaining the highest standards of reliability? In this episode of Incidentally Reliable, we sit down with Amit Rhinde, Head of Engineering at GoDaddy, to uncover the secrets behind building resilient systems, scaling global operations, and ensuring uptime for millions of users. Amit takes us through his incredible journey, from pioneering SRE practices at Adobe and AWS to leading one of the world's most trusted hosting platforms.