When an incident strikes, an organization’s reputation and revenue, as well as customer trust are at stake. Assembling an effective incident response team is critical to minimizing the incident’s impact. But what exactly is an incident response team? Who should be a part of the team and what are their responsibilities? Successful incident responses require a team with a diverse set of problem-solving and communication skills.
With Halloween behind us and the holiday shopping season fast approaching, engineering and product teams know what that means: code freezes! At xMatters, code freezes are a part of our product release process in anticipation of the busiest — and most important — time of the year for many of our customers. But code freezes are just one piece of the puzzle in how we ensure our customers have the most reliable experiences. The way our product releases are designed is much more than that.
As infrastructure stacks grow increasingly complex and involve an ever-growing number of services, system failures are becoming more and more common. There can be a variety of reasons why systems fail: software bugs, misconfiguration or interactions between services that cause unexpected behavior, the network is down, and of course, those rare occasions where natural events can render data centers inoperative.