Coffee Break Webinar Series: "Intelligent Observability - Blamefree Retrospectives"
A selection of questions and answers from our recent webinar on leveraging AIOps to run sustainable, blameless retrospectives.
A selection of questions and answers from our recent webinar on leveraging AIOps to run sustainable, blameless retrospectives.
Murphy’s Law states that anything that can go wrong, will go wrong. The challenge for most businesses is putting the right method of communication in place for when the inevitable happens. The only way to handle this is to expect the worst and then prepare for it. A key factor in deciding for any alerting solution is can my team be notified properly when a major outage happens .
This post covers some of the highlights that we have released in the last 6 months.
How close can CSPs come to realizing the zero touch network vision, and what are the best next steps for getting there? To discuss this question Anodot brought together a panel of experts, including Kim Larsen, CTIO of T-Mobile Netherlands; Ira Cohen, co-founder of Anodot and the company’s chief data scientist; Fernando Elizalde, analyst at GSMA Intelligence; and moderator Justin Springham.
Another day, another drama! This one, though, is very much of my own making. I have been wanting to try my hand at a bit of chaos engineering for some time now but C&Js just hasn’t been ready. Sarah’s been up for it too, though, at Animapanions. And now that our CIO, Charlie has seen MTTR drop across every single technology team, thanks to the rollout of Moogsoft and the new incident management system (kudos to James), it’s pilot day.
Running any application in production assumes reliable monitoring to be in place and serverless applications are no exception. As modern cloud applications get more and more distributed and complex, the challenge of monitoring availability, performance, and cost get increasingly difficult. Unfortunately there isn’t much offered right out of the box from cloud providers.
Having a Status Page is like having a dog. A dog alerts you to an incident; sudden noise, approaching neighbor, squirrel… A dog sounds the alarm on an intruder. A dog even alerts you to maintenance by barking at every handyman, garbage truck, and gardener within sight. As a dog fetches the same stick over and over, so does a status page fetch the attention of your users – especially during a live incident – with each browser refresh they wait for the status to change.
Monitoring solutions are a vital component in managing an application’s environment. From the systems layer all the way up to the end user’s connection to the app, you want to find out how the platform is performing. Indicators like CPU, memory, the number of connections, and overall health help teams make informed decisions for guaranteeing uptime. Teams monitor metrics (short-term information) and logs (long-term information) mainly from a reactive perspective.