Manhattan, NY, USA
Apr 2, 2020 | By Rich Burroughs
Alex Hidalgo is a Site Reliability Engineer at Squarespace, and he’s currently writing a book called Implementing Service Level Objectives for O’Reilly Media. The first three chapters of the book are available now through O’Reilly’s early access program. I had a chance to read those chapters and ask Alex some questions about service level objectives and reliability. Thanks, Alex, for sharing your knowledge.
Mar 17, 2020 | By Dylan Nielsen
Incidents come up quickly and tracking critical tasks to be done in the moment and after an incident is resolved it can be challenging to keep up with what was done by who during an incident and what tasks still need to be completed. In an effort to continue simplifying your incident response process today we are happy to announce an overhaul of ticketing and task tracking on FireHydrant along with a major overhaul of our JIRA integration.
Mar 9, 2020 | By Mandy Mak
Bugs will happen from time to time. As our systems grow in complexity, new functionalities mean new risks. What makes or breaks a team is not only how it handles incidents, but also how it learns from them. This is where incident postmortems come into the picture.
Jan 31, 2020 | By Anna Kelley
Outages are inevitable. It is how we respond that can make or break our company. In this post, we will talk about how Service Catalogs can impact your incident response process and make it more effective. When a company has just a handful of services, it can be relatively easy to figure out who to call when something breaks. But when companies are at the stage of having dozens of services to manage, figuring out who to page or reach out to can be a challenge.
Jan 15, 2020 | By Anna Kelley
As Engineering teams start spending more time and effort on incident response, they are usually focused on improving process with their specific team. We think there are additional benefits that can come from a holistic approach to improving incident response across your organization. In this post, we will explore how you can enable Engineering and Customer Success teams to work more effectively when an incident occurs.