Operations | Monitoring | ITSM | DevOps | Cloud

Security vs. ops: the two sides of reliability

Security and ops work together to keep your systems reliable, but why do we treat them so differently? Reliability results start when you proactively take charge of your infrastructure and application risks. Transcript: When we talk about reliability in the software space and the digital operations space, you really end up falling into these two different mindsets.

Reliability means smooth on-call and a strong team

True reliability is when your engineers have confidence in their systems and their teams. Full transcript: Reliability to me means my on-call shift is gonna be smooth because everybody is making the attempts to be smart about the type of code that we're writing. And we're regularly testing to make sure that our system has redundancy and can withstand latency spikes, it can withstand resource spikes.

AI Reliability Insights: How to Build a Gremlin MCP Server

Gremlin’s Reliability Intelligence helps teams uncover the cause behind failure modes so they can move faster and improve reliability without sacrificing velocity. The new Gremlin MCP Server, part of Reliability Intelligence, gives you new ways to explore your data, giving you access to insights and recommendations to improve reliability and better run your systems using Gremlin. In this webinar, Gremlin CTO Sam Rossoff shows you how to integrate your favorite LLM and use plain language to query data, uncover insights, create dynamic dashboards, and more.

How to make Netflix reliable: Address low-hanging fruit

Reliability doesn’t have to be fancy and dramatic. Kolton and his team dramatically improved Netflix reliability by focusing on low-hanging fruit. FULL TRANSCRIPT: My first holiday peak at Netflix, where my VP of engineering came to me and he said, "Kolton, what do you think the chance we make it through the holiday peak without an outage is?"  I thought about it for a minute and I said, "50/50.".