Operations | Monitoring | ITSM | DevOps | Cloud

Black Swans and Grey Rhinos - Observations on Coronavirus and IT Ops During Crisis

As the Coronavirus crisis unfolds and all of us struggle to understand its implications and to adapt, many thoughts come to mind on many different levels – personal, business related, philosophical. This event is definitely a game changer, in the near future for sure – and many say in the long run as well.

Darwin Was Right: Change Will Separate the Strong from the Weak

“It is not the strongest or the most intelligent who will survive, but those who can best manage change” said Charles Darwin over 150 years ago – and probably every IT Ops engineer out there these days would agree with him. According to Gartner (and probably your experience as well), over 80% of service disruptions these days are caused by changes in infrastructure and software.

Modernizing and Consolidating Your Monitoring Without Losing It...

The current days of remote work and “IT Ops from home” may or may not be here to stay, but they definitely reinforce the need for consolidating and modernizing our monitoring. The challenges which multiple siloed tools create for understanding the big picture are only exacerbated by having just one screen to look at when monitoring our IT from our kitchen table.

Why I don't hate ITIL (aka ITIL in a DevOps World)

When I read Greg Ferro’s infamous “Why I hate ITIL so much” blog back in 2015, I have to admit that I agreed with many (albeit not all) of what he said. Maybe it’s the issues that I have with authority in general, or maybe it’s my many years of working within the constraints of ITIL and ITSM in operating systems and services – but I truly believed (and still do) that well-educated, experience and consensus-based pragmatism is what actually gets things done.

"TRIBAL KNOWLEDGE" (noun): That thing you should have done, if only someone had told you.

As a former NOC engineer, I clearly remember my onboarding, and especially the deep-rooted fear I felt every time I encountered an alert that was new to me – particularly during a night shift. My only consolation was that I was never alone during training, so there was always someone I could ask that very awkward question: “I’m new here, what do we do with this…?”

"Homegrown" May Be Good for Tomatoes, Not So Much for IT Ops

In the past, many organizations grew and managed their own data centers. Some still do. And many are still developing their own automated incident management (aka Autonomous Operations) tools. But as IT grows and becomes evermore complex and fast-moving, the reality of what it means to do so kicks in, and organizations are re-evaluating their strategies.

AIOps: What's in a name?

Since the term ‘AIOps’ came into use in the monitoring sector a couple of years ago, there has been much confusion about what it means. We hear from users asking if they need it – a difficult question given that the answer depends on how you define it. Since there isn’t a broadly accepted definition, a range of vendors now market their products as AIOps offerings, even though these products cross subsectors and may not be directly competitive.

Don't Get Left Behind: Augmenting Decisions in DevOps With AIOps

DevOps is fast, glamorous and agile. It is key to keeping modern, fast-moving IT environments up and running. And it is no stranger to automation: DevOps has been relying on automation for many years now to ensure the rapid delivery of applications in this ever-changing landscape. Yet even the most agile and advanced DevOps teams cannot escape the growing complexity, scale and pace of the modern IT stack.

Embracing Chaos With BigPanda's Root Cause Analysis Features

The ever-growing complexity, scale and pace of IT environments puts a huge burden on IT Ops, NOC, and DevOps teams, who are tasked with keeping these environments up and running. One of the biggest challenges is Root Cause Analysis (RCA). When something breaks, they need to determine what broke it, and they need to do it fast.