Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

5 tips for incident management when you're suddenly remote

A lot of teams are asking us about how to do incident management when you’re suddenly remote. We understand. Going remote can be scary, and few things are scarier than having a service outage you aren’t prepared for. Nobody wants to be in a situation where an important service going down and the engineer who can help isn’t answering on Slack. And if your company isn’t used to working remotely, it can be harder than ever to be on the same page during an incident.

5 tips for incident management when you're suddenly remote

A lot of teams are asking us about how to do incident management when you’re suddenly remote. We understand. Going remote can be scary, and few things are scarier than having a service outage you aren’t prepared for. Nobody wants to be in a situation where an important service is going down and the engineer who can help isn’t answering on Slack. And if your company isn’t used to working remotely, it can be harder than ever to be on the same page during an incident.

Keeping the Internet "Always On"-the Pressure of COVID-19 on Incident Response Teams

Social distancing measures, like remote working, school closures, and “shelter in place” have driven us onto the Internet more than ever before, creating unprecedented demand for a range of digital services from companies, many of whom weren’t set up for this type of pressure. As a digital operations company, we help teams ensure their websites and apps are running perfectly and partner with over 12,000 organizations around the world—from start-ups to 58 of the Fortune 100.

How to Evolve Your ITOM Strategy for Modern IT through 2024

Major changes are redefining how IT operations monitoring is done, and impacting tooling, processes and skills. But how exactly can IT Ops leaders ensure continuous service assurance of their critical digital services now and in the future? What’s the key to having the required visibility and control over these modern and complex IT environments that are increasingly hybrid, distributed, dynamic and modular?

Efficient task management for remote/work-from-home teams

As COVID-19 continues to impact communities globally with health care professionals working tirelessly to prepare for emergencies and prevent the further spread of the pandemic, technology companies are also doing their part. Twitter, Google, and Amazon have issued directives instructing employees to work from home as the companies themselves move to pull out of tech events while hosting their own events virtually.

Remote Work: Splunk + Zoom

As everyone is taking proactive measures to stay healthy, organizations are increasingly having their employees work from home. At Splunk, we are focused on bringing data to every question, decision and action — and remote work for us equals Zoom for online meetings and workspaces. As our customers use Splunk for real-time data processing and analytics, they use our Splunk Mobile App (Android, iOS) when they need to take their dashboards on the go.

Our Top 5 On-Call Practices

On-call: you may see it as a necessary evil. When responding to incidents quickly can make or break your reputation, designating people across the team to be ready to react at all hours of the day is a necessity, but often creates immense stress while eating into personal lives. It isn’t a surprise that many engineers have horror stories about the difficulty of carrying a pager around the clock. But does on-call have to be so dreadful? We think not.

6 Steps to a More Effective Postmortem

Detailed and specific description of impact? Check. In-depth root cause analysis? Check. Clearly defined and easy to follow resolution? Check. Postmortems present an incredible learning opportunity, despite the inherent cost of time and effort. They ensure an incident is documented, that all contributing factors are understood, and that effective preventative actions have been put in place to reduce the likelihood or impact of recurrence.

Incident management for remote/WFH teams

As the world tries to battle COVID-19, most of our customers here at Zenduty have started implementing social distancing measures within their companies by asking all their employees, including the NOC, SRE, ITOps, Support, and software engineering teams to work remotely or from home. While that may appear to be a drastic change in your day-to-day operations, it need not disrupt your reliability and support operations.

PagerDuty Is for People: Supporting Our Community During COVID-19

Yesterday, we released our earnings during an unprecedented time for society and the market. One of the things I noticed was the collective empathy we experienced as we talked to different teams and companies in preparation, and in our analyst call backs, where to a person, everyone kicked off their call by wishing each other good health and safety. It reminded me that when we are all in this together, not only are great things possible, but it also feels less daunting and more manageable.