Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

StackState 5.0 UI; Gain a Rapid Understanding and a Speed up Discovery

Do you experience this: Your brain seems to explode because there is so much you try to fit into ”working” memory? It can happen on a Friday afternoon, after a busy work week. Or on a Monday, looking at your calendar while figuring out how to fit in all those meetings and still get real work done.

Part 4: Causal Observability - Level 3

It’s not surprising that most failures are caused by a change somewhere in a system, such as a new code deployment, configuration change, auto-scaling activity or auto-healing event. As you investigate the root cause of an incident, the best place to start is to find what changed. To understand what change caused a problem and what effects propagated across your stack, you need to be able to see how the relationships between stack components have changed over time.

AIOps for Real: Characteristics of a Platform That Add Value and Drive Change

When you’re investing in automation solutions, ultimately, tangible results need to follow quickly. Getting a return on investment (ROI) out of an automation project after two years is something that would have been OK in the not-so-distant past but is no longer acceptable nowadays. With the current speed of change, where new technologies come and go and existing ones evolve at lightning speed, IT teams require much faster time to value on automation investments.

Part 2: Monitoring - Level 1

The first level of the Observability Maturity Model, Monitoring, is not new to IT. But as reliable IT system operation becomes more and more critical, the importance of monitoring continues to increase. A monitor tracks a specific parameter of an individual component in the system to make sure it stays within an acceptable range; if the value moves out of the range, the monitor triggers an action, such as an alert, state change or warning.

Changes are Observability's Biggest Blind Spot

Classically, the space of observability lies within layers of information on a dashboard. It operates by using the fundamental trio of data — metrics, logs and traces — from each layer of the environment to assess the health of an IT infrastructure. However, a time component is critical, making the stack observable at any point in time. Gathering reliable data and insights into your IT infrastructure remains the primary role of observability tools and services.

Real World Insights - My Take on the Observability Maturity Model

A prelude to our upcoming six-part Observability Maturity Model Fundamentals blog series. By Lodewijk Bogaards At StackState, we have spent eight years in the monitoring and observability spaces. During this time, we have spoken with countless DevOps engineers, architects, SREs, heads of IT operations and CTOs, and we have heard the same struggles over and over.

Anomaly Detection and AIOps - Your On-Call Assistant for Intelligent Alerting and Root Cause Analysis

In this blog, we examine how anomaly detection helps by setting up healthy alerts and providing efficient root cause analysis. Anomaly detection, part of AIOps, guides your attention to the places and times where remarkable things occurred. It reduces information overload, thereby speeding up RCA investigation.

Site Reliability Engineering, Site Reliability Engineers and SRE Practices: State of Adoption

Site reliability engineering (SRE) is what you get when you treat operations as if it’s a software problem. The mission of an SRE practice is to protect, provide for and progress the software and systems offered and managed by an organization with an ever-watchful eye on their availability, latency, performance and capacity.1.