Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Leveling up your observability practice - Part 2

Lessons from the front lines: Challenges in your observability maturity journey In our previous blog, we explored the observability maturity spectrum — revealing that while only 7% of organizations consider themselves experts, the majority (43%) are actively working to improve their practices. We saw how mature organizations achieve better outcomes, from faster root cause analysis to reduced user-reported incidents.

Adding AI to Observability 2.0 for Dynamic Observability

The original premise of observability was to ensure system health, identify issues, and resolve those issues efficiently. As I recently outlined, the legacy approach (sometimes called Observability 1.0 now) relied heavily on metrics and tracing because logs were seen as too noisy or challenging. But, as most forward thinkers have identified now, logs are exactly the telemetry type that we need the most.

Unlocking Peak Performance with Kentik's Azure Network Observability Tools

In today’s multi-cloud landscape, maintaining smooth and reliable connectivity requires complete visibility into cloud networks. With Kentik, network and cloud engineers gain the tools to monitor, visualize, and optimize Azure traffic flows, from ExpressRoute circuits to application performance, ensuring efficient and proactive operations.

Early Observability in Platform Engineering: Challenges and Solutions

Since the emergence of the cloud, the DevOps movement, and the rise of microservices, developers have been increasingly responsible for the operation of their software. “You build it, you run it” (YBYR) and “You build it, you operate it” (YBYO) have become common mantras in the software engineering industry. However, there’s a misunderstanding in this statement. Developers should remain focused on building software.

An easier way to manage your observability collectors | Grafana

Managing observability collectors at scale is often overwhelming, but it doesn’t have to be. Grafana Fleet Management offers a better way to monitor, configure, and control your collectors—all from a centralized platform. With remote configuration and detailed health insights, you can quickly resolve issues, save time, and reduce manual effort.

Emergency Observability with Coroot

If you’re an experienced engineer, you likely have comprehensive observability and monitoring set up for your production systems. So if issues arise, you’re empowered to resolve them quickly. Yet, there are way too many systems out there, especially smaller and simpler ones, which are running with only rudimentary observability systems, or no observability at all. This means when an application goes down or starts to perform poorly, it may be very hard to pinpoint and resolve the issue.

Leveling up your observability practice - Part 1

Lessons from the front lines: Moving to observability maturity What separates the observability experts from the novices? It's a question that's been on my mind lately, especially after diving into our recent 2024 State of Observability Survey of over 500 practitioners. In my past roles as a DevOps engineer and a site reliability engineer (SRE), I've seen firsthand how a mature observability practice can be the difference between sleepless nights and smooth sailing.

Easily control observability collectors at scale with Fleet Management in Grafana Cloud

Managing observability workloads can quickly overwhelm even the most experienced admin. Maybe you’re dealing with multiple departments, each needing its own collector configurations and pipelines. Every time you have to run a test or roll out a change, the process is cumbersome and introduces risk. Or perhaps you’re responsible for tracking hundreds of collectors across different environments and regions. In a scenario like this, troubleshooting individual issues feels nearly impossible.