Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

Having On-call Nightmares? Runbooks can Help you Wake Up.

You aren't sure how long you've been here, but the view outside the window sure is soothing. Before you can fully take in your surroundings, a siren rips you back into the conscious world. Slowly, you begin to piece together that you exist, and you are on call. The ringing, much louder now, pierces through your skull as you begin to open your bleary eyes. You turn over your pillow, grab your phone, and click through the PagerDuty notification.

Using Remote Actions to Create ServiceNow Incidents

Recently we have received a lot of requests for Enterprise Alert to not only alert on critical situations but to also take a proactive approach to initiate, record and track those situations through ITSM tools such as ServiceNow and BMC Remedy. This post will center around what happens when critical systems fail and tickets are not being created in ServiceNow due to a break in the workflow.

Three Ways MSPs Can Benefit From Dynamic Thresholds

People around the world depend on Managed Service Providers (MSPs) to keep their businesses running like clockwork, even as their IT infrastructure evolves. Keeping workflows efficient leads to higher profits, but this can be a challenge due to a mix of on-premise infrastructures, public and private cloud, and other complex customer environments. The shift to remote work in 2020 due to the COVID-19 pandemic has only made this more challenging for MSPs.

Key Learnings from the Facebook Status Page

Yesterday April 8th 2021 at around 22:00 UTC, Facebook experienced a major outage where Facebook, Messenger, WhatsApp web and Instagram were down, lasting for as much as 3 hours. This was reported at Facebook’s status page, which was a good example of how to communicate and incident.

Learning from Facebook: Keep your Status Page Separately from your Infrastructure

Yesterday April 8th 2021 at around 22:00 UTC, Facebook experienced a major outage where Facebook, Messenger, WhatsApp web and Instagram were all down and unavailable. The last update was reported 3 hours later resolving the incident, so even though the status page doesn’t state the duration of the incident, we can assume it was still affecting some users that long.