A change to a single line of code sent the prices of thousands of products on Amazon to a penny. Taking care of customers and focusing on engineering best practices allowed a company to survive and thrive after a "make or break" event.
Once upon a time, our hosted DB provider had a terrible security incident, causing us to take down the entire product for 24 hours. This is a story about the aftermath: the downtime, the incident response, how we got back up, and how we communicated with customers.
First, a tale of a silent-but-deadly data push on a path to take Google Ads fully offline within 90 minutes, with no diagnosis in sight. Second, a zombie haunted pipeline that kept developers awake late into the night. ...each concluded with how we slayed the beast!
SolarWinds® Network Performance Monitor (NPM), created by network engineers for network engineers, is a complete monitoring solution designed to provide you with the tools you need to work smarter, improve visibility, and prevent downtime. See why SolarWinds is a worldwide leader in network monitoring.
As organizations increasingly face an unprecedented volume of IT-related incidents, Nexthink focuses on detecting and correcting issues at their source. The result? Less IT disruption for your employees, lower IT support costs and mitigated risk associated with enterprise-wide application breakdowns.
A new advance in incident alert management for MSPs: OnPage enabled the entire alerting process to become an integral part of Datto's Autotask service desk. MSP teams can now create workflows for alerts to be sent automatically to the person on-call based on customizable incident and ticket criteria.
With Request Life Cycle, administrators can predefine a set of statuses each ticket goes through, as well as specify conditions and actions for each status change. This ensures clarity and consistency in how each ticket is processed.