Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Why Predict When You Can Prevent?

In the evolving world of IT monitoring, I see that many organizations want to move to predictive monitoring. They usually allocate a significant budget for this type of effort that ultimately fails to yield the desired return on investment. Often, organizations are already collecting and processing large amounts of IT telemetry streams and using multiple solutions to monitor their systems. So, predictive monitoring seems like the logical next step.

The Business Case for Observability with Context

My team was not happy with me. I had just convened a meeting of my direct reports — and the managers that reported to them — to deliver the news personally. “No more new tools,” I told them. “We have everything we need to do our job. Our environment just doesn’t change that fast. So stop bringing me requests for new tools. The answer is no.” It was unquestionably the right call at the time, but today, it would be laughable.

When an IT Incident Occurs at Your Company, What TV Show Does It Most Resemble?

If you’re like us, you’ve been binge-watching a lot of shows over the past few months. That got us thinking—do you ever consider your company's problem resolution process to feel like an episode of one of your favorite shows? We ran a short poll to see how IT teams would relate their incident resolution processes to popular TV shows. Here are the results. We ran a short poll to see how IT teams would relate their incident resolution processes to popular TV shows.

Observability with Context: Telemetry, Time, Tracing, and Topology

That’s the question ops personnel have been asking for decades whenever something goes wrong in the production IT environment. Everything was working before, so the reasoning goes, and now it’s not. We have an incident. And to figure out what caused the incident – and hence, to have any idea how to fix it – we must know what changed. There’s just one problem with this approach. What if everything is subject to change, all the time?

Designing a flexible non-SQl query language without reinventing the wheel

There are tons of query languages. Yet, another query language was invented: the StackState Query Language, or STQL for short. Perhaps this raises some questions. Such as: Why did we not choose to implement SQL? Did we reinvent the wheel? How did we balance the complexity of the language against the time to implement the language? What's the learning curve of this new language? Let me share with you our novel approach.

Automated Root Cause Analysis & Anomaly Detection in Concert

Everyday IT operators are trying to prevent outages of business-critical applications. When prevention is not possible, IT operators strive to reduce the mean time to repair (MTTR) as much as possible. Improving resolution time can be quite a challenge. But IT operators don't stand alone in this challenge. They can use smart solutions that support Automated Root Cause Analysis and Anomaly Detection.

Observability Redefined: 3 steps to improve your IT infrastructure

Are you already applying IT observability to keep up with rapid changes in your IT operations landscape? Currently, the fast adoption of new infrastructures, including hybrid clouds, containers, and microservices, challenges the market. As organizations move towards these highly dynamic architectures, the requirements for traditional IT monitoring change dramatically. More data keeps on coming, and having the time and skill to keep up with this ongoing stream seems to get harder and harder.

StackState Open-source

Open-source software started around the millennium and is now one of the cornerstones of modern software development. Open-source projects make their source code available to anyone so that engineers across the world can inspect the code to find bugs or make changes to suit their needs. Today, there are more than 180,000 open-source projects available, according to Wikipedia. We at StackState are big believers in open-source software.

Introducing: Observability for Cloud & Containers

Are you currently dealing with complex and fast-changing Cloud & Container environments? If your answer to that question is yes, then you are probably looking for an easy solution that gives you complete control to make sense of all these fast and complex IT environments. In the dynamic world of microservices and containers, traditional monitoring solutions are no longer sufficient to provide needed visibilities to maintain healthy and happy environments.