Operations | Monitoring | ITSM | DevOps | Cloud

Three communications best practices for incident handlers

The importance of well-managed communications when handling IT and security incidents cannot be overstated. If updates are not communicated in a timely and accurate manner, misunderstandings, misalignment, and costly errors will occur. Not to mention, resolution will be prolonged. And if highly sensitive information is communicated to those who should not be privy to such, then the risk of legal ramifications is high, as would be the damage.

Why Self-hosting Might not be a Good Choice for your Status Page

We all remember when Facebook, WhatsApp, and Instagram shut down in April of last year for a whole day. And while it was terrible for their company—it’s an educational moment for the rest of us to learn from. Facebook’s status page is self-hosted, and that puts their status pages at risk of the exact issue it’s designed to tell you about.

Can Endpoint Protection Keep up With Modern Threats?

Endpoint protection is a security approach that focuses on monitoring and securing endpoints, such as desktops, mobile devices, laptops, and tablets. It involves deploying security solutions on endpoints to monitor and protect these devices against cyber threats. The goal is to establish protection regardless of the endpoint’s location, inside or outside the network.

Major IT Outage 2021 Recap

We saw that no one is immune from major IT outages in 2021, not even mega titans like Google, Facebook, and Amazon AWS. The following is a recap of some of the major IT outages with widespread impact for 2021. Amazon Web Services’ (AWS) historic outage occurred on December 7, 2021 and lasted roughly 6 and a half hours. The breadth of Amazon and its reach caused not only their warehouse and delivery operations to stop.

Slack outage

Slack, a popular enterprise communications platform, faced a 5-hour system outage yesterday between 9:25 AM – 2:24 PM EST on February 22, 2022. Slack services affected included: messaging, search, link previews, apps/integrations/APIs, posts/files, workspace/org administration, login/SSO, notifications, connections, and calls. AlertOps was NOT affected by this outage.

Cloud Incident Management Guide

It is a well-established fact that companies looking to grow in the digital age can facilitate this mission by adopting the cloud. When pursued with the right intent and implementation strategy, cloud adoption acts as a powerful force multiplier, yielding a cutting-edge IT powerhouse for businesses and helping them grow and innovate at an accelerated pace. Organizations that adopt a cloud-first strategy must safeguard themselves from critical, service-disrupting incidents.

Cut Out the Noise: Issue Grouping and Alerting Best Practices

We’re drowning in emails and Slack notifications. As our eyes glaze over, we start bulk-archiving everything into folders we most likely never go into again - missing critical bugs, crashes, or slowdowns sometimes weeks too late. Learn from Dustin Bailey, Solutions Engineer at Sentry, and Phillip Jones, Ecosystem Product Manager, as they share issue grouping and alerting best practices to help cut out the noise so you can start taking action on issues faster.