Operations | Monitoring | ITSM | DevOps | Cloud

Statuspage

Incident response: how to keep tech problems from becoming people problems

Subscribe to Work Life Get stories about tech and teams in your inbox Subscribe When one of your IT services is on fire there’s no time to waste. Especially if that fire is blocking your users from getting stuff done. Rapid resolution tends to eclipse all else during an incident, often causing your team to ignore or forget pieces of the incident response process – like keeping people in the loop.

Stay code-connected with 12 new DevOps features

Get stories from Work Life in your inbox Our mission is to unleash the potential of all teams by harnessing the power of collaboration tools and practices. This is particularly true for teams practicing DevOps, which is all about unlocking collaboration between development, IT operations, and business teams. However, this increased collaboration can come at a cost to developers.

5 tips for incident management when you're suddenly remote

A lot of teams are asking us about how to do incident management when you’re suddenly remote. We understand. Going remote can be scary, and few things are scarier than having a service outage you aren’t prepared for. Nobody wants to be in a situation where an important service is going down and the engineer who can help isn’t answering on Slack. And if your company isn’t used to working remotely, it can be harder than ever to be on the same page during an incident.

Why you need a status page

There are as many ways to trigger an incident as there are new code deployments across the globe and, with the emergence of cloud-reliant businesses, uptime accountability has shifted from on-premise server teams to the service providers themselves. SLAs, SLOs, and websites dedicated to downtime have suddenly come to life in the internet age, and having a status page is now an industry standard.

Introducing a brand-new look for Statuspage

Here at Statuspage, we take pride in helping you communicate proactively to customers during a service outage. We also believe that during moments of stress times, the tools you rely on should be simple, intuitive, and easy to use. Using simplicity as our guide, we’ve updated the design of the Statuspage management portal. We kept things organized much as before, keeping the focus on what’s most important – helping you create and update incident communications.

Introducing the incident communication template generator

When things go wrong, your users need to know – but it’s not always easy to determine what to say or how to say it. If you’re responsible for getting the word out to hundreds or thousands of users, it can feel like a heavy weight on your shoulders. The task at hand is urgent, yet must be handled delicately. As someone who’s handled incident communication on Statuspage’s status page – the mother of all status pages – I know how difficult these moments are.

When it comes to system metrics, skip vanity and promote transparency

At Hosted Graphite, our users rely on us for a heavy-duty component of their business: monitoring their stack. This is a responsibility we take very seriously and we realize how critical it is for a user to know right away whether the problem detected is related to their own systems or to our system. That’s why we choose to publish our internal system metrics to our public status page.