Operations | Monitoring | ITSM | DevOps | Cloud

Alerting

How to Raise the Bar on the ITIL's Recommendations for Critical Incident Management

According to the ITIL, the framework of best practices for delivering IT services, there is a recommended process flow for how to handle major incidents. Clearly, the IT community would be well served to follow the ITIL’s systematic and professional approach, whose benefits, according to CIO Magazine.

Incident Response with AWS Systems Manager

The typical DevOps on-call engineer is responding to alerts, triaging based on service impact, troubleshooting high priority incidents, and taking action to remediate issues. Automation tools like AWS Systems Manager can be a big help in reducing some of the more repetitive work and allowing engineers to focus on the most important tasks.

Can You Trust Machine Learning In IT Operations?

Chronically understaffed and constantly stressed-out IT Ops and NOC teams are overwhelmed by today’s IT noise. Artificial Intelligence (AI) and Machine Learning (ML) can help these teams because ML (and AI) are exceptionally good at processing enormous volumes of very complex data in real-time, or near real-time, and surfacing actionable insights.