The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
The effects of climate change mean we’re increasingly seeing black swan weather events impacting our working lives. From wildfires and hurricanes to the ever-present threat of earthquakes, 2021 has seen its share of crises. This obviously raises serious questions for companies about the safety of their workforces. As a global company, PagerDuty has employees across the world. When a disaster strikes, everyone needs to have the necessary training, resources and tools to act.
For retailers, uptime is money and issues can cost thousands of dollars per minute. With infrastructure comprising complex services such as payment gateways, inventory, and mobile applications, maturing digital operations is vital for ensuring services are always on and customers get the best experience.
Net at Work is a German IT company with over 100 employees that provides its customers with solutions and tools for digital communication and collaboration. Their product NoSpamProxy offers reliable protection against spam and ransomware, legally compliant email encryption and more. Customers of Net at Work are using it as a SaaS solution, and it is being monitored with the agentless network monitoring software PRTG Network Monitor from Paessler AG.
At VMware, we make use of modern development and site reliability engineering (SRE) practices on a regular basis. And those of us who work on the VMware Tanzu Observability product marketing team regularly get exposure to various SRE teams that implement modern practices with the observability technology we create.
Effective healthcare communication requires proper software and processes to ensure that the right person receives timely messages. Unfortunately, Divisions of Family Practice (DoFP), a large community-based network of physicians located in British Columbia, Canada, relied on a third-party answering service to connect long-term care facilities (LTCFs) with on-call providers.
In 2016, Google released the definitive book on Site Reliability Engineering (SRE) - a practice that had originated in the company to take care of a monumental problem - how to keep the Google services running with high reliability. Over the years, SRE has been widely adopted by dev teams across the globe and is a popular role at startups and enterprises alike. Here is a look at how search for SRE has trended over the years.