The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
We’re excited to announce a new set of updates and enhancements to the PagerDuty platform. The product team has been hard at work making updates from Event Intelligence, Runbook Automation, and Applications with Monitoring Tools, to PagerDuty and PagerDuty Community Events.
The holiday season is here, and global retailers are prepared for the biggest retail event of the year. The decrease in new COVID-19 cases, coupled with a rise in vaccination rates, provides a glimmer of hope for shoppers looking to spend for friends and family. Holiday spending is expected to break previous records this year, growing up to 10.5 percent over 2020.
Although every company can benefit from SREs, some need SREs more than others.
They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.
This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.
Our December update brings a ‘Who is on duty’ board displaying current team members on duty with contact information. In addition, we have simplified the manual sending of Signls and improved the integration with Azure Sentinel. As always, you can find all the details in this article.