Operations | Monitoring | ITSM | DevOps | Cloud

The Ins and Outs of Postmortem Documentation

No matter how you design your architecture or what technologies you implement, critical incidents will happen. When things go wrong, it is easy to get carried away and forget about the bigger picture. But your work isn’t done after you fix the immediate problem; now is the time to take a look at how the incident actually happened so that you can learn from it.

Top 11 Incident Response Influencers to Follow in 2019

The incident response industry is anything but static, and it is often said that the key to staying ahead is staying informed. But that’s easier said than done. Faced with the increasing sophistication of cyber attacks and the growing complexity of IT architectures, we often drown in our daily slew of tickets and alerts, with no time left to spare.

How Does Google Handle Critical Incidents?

While there are some very good sources out there on how to manage a critical incident, Google also wrote a chapter on incident management in their book, “Site Reliability Engineering”. In this chapter, the folks at Google present their approach to a well-designed critical incident management process.

How to Take Business Continuity Tests To The Next Level

The importance of effective business continuity planning (BCP) cannot be understated. Being able to avoid and mitigate the risks and damages associated with a disruption to operations is critical to the health of any business. And, the two main pillars upon which a robust BCP program rests are, of course, the plan and the testing program.

How to Raise the Bar on the ITIL's Recommendations for Critical Incident Management

According to the ITIL, the framework of best practices for delivering IT services, there is a recommended process flow for how to handle major incidents. Clearly, the IT community would be well served to follow the ITIL’s systematic and professional approach, whose benefits, according to CIO Magazine.

7 Key Enablers for Effectively Managing Critical Incidents

When a critical incident hits, the implications for the business could not be more profound. Whether it’s a productivity system that powers the efficiency of thousands of employees, or an online service that serves millions of customers and drives the company’s revenues - no organization can afford anything less than an immediate and effective resolution.

How to Collaborate Effectively with External Service Providers

Imagine the following scenario: A critical incident hits. You have a few different teams on it throughout the company. Managers and other stakeholders need to be updated every thirty minutes. But the biggest issue is that you need to engage with external service providers. You need your communication provider and hosting service to check on their side and report back. You want a consultant in the picture and an external help desk service might be involved as well.