Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Against Incident Severities and in Favor of Incident Types

About a year ago, Honeycomb kicked off an internal experiment to structure how we do incident response. We looked at the usual severity-based approach (usually using a SEV scale), but decided to adopt an approach based on types, aiming to better play the role of quick definitions for multiple departments put together. This post is a short report on our experience doing it.

Observability as a superpower

With every job I have, I come across a new observability tool that I can’t live without. It’s also something that’s a superpower for us at incident.io: we often detect bugs faster than our customers can report them to us. A couple of jobs ago, that was Prometheus. In my previous job, it was the fact that we retained all of our logs for 30 days, and had them available to search using the Elastic stack (back then, the ELK stack: Elasticsearch, Logstash, and Kibana).

Network Observability: Mastering Infrastructure Data for Smarter IT

If you want to know exactly what’s on your network and how it’s all connected in real time, then network observability is the answer. Network observability pulls data from sources across your network infrastructure to model a detailed view of your systems and how they interact. This lets you understand exactly what’s happening on your network at any given moment so you can optimize performance.

Booking.com's Observability Overhaul: Unified Metrics, Logs, and User Insights | Grafana & OTel

Murugesan and Ahmadali from Booking.com's Observability Team as they dive into the journey of modernizing observability. Discover how they transformed fragmented systems into a centralized, scalable platform using OpenTelemetry and Grafana solutions. They share insights on their three-year strategy, the importance of unified metrics and logs, and overcoming challenges, from technology transitions to fostering teamwork.

Relational Fields: Query Even More Relationships in Your Traces

Earlier this year, we introduced relational fields. Relational fields enable you to query spans based on their relationship to one other within a trace, rather than only in isolation. We’ve now expanded this feature and introduced four new prefixes: child., none., any2., and any3.. Previously, you could use root., parent., and any. to query on the root span of your target span’s trace, the parent span of your target span, and any other span in the same trace as your target span.

The Ultimate Observability Experience at SolarWinds Day

SolarWinds Day has consistently been one of the most enlightening events of the IT year, offering rich insights into technology, cybersecurity, artificial intelligence (AI), and more. This quarter's event, SolarWinds Day: Observability Anywhere. Precision Everywhere, tackled the complexities of IT infrastructure observability. I was delighted to host the panel discussion; here’s my overview of the key talking points.

10 Best Zabbix Alternatives for Infrastructure Monitoring in 2024

Infrastructure monitoring has evolved into a critical component of modern distributed systems, driving organizations to explore robust Zabbix alternatives. While Zabbix has served as a cornerstone of traditional monitoring, today's microservices and cloud-native architectures demand different approaches. The landscape of Zabbix alternatives has matured considerably, offering specialized solutions for various monitoring scenarios.

Advanced Open edX Monitoring with AppSignal for Python

In the first part of this series, we explored how AppSignal can significantly enhance the robustness of Open edX platforms. We saw the challenges that Open edX faces as it scales and how AppSignal's features — including real-time performance monitoring and automated error tracking — provide essential tools for DevOps teams. Our walkthrough covered the initial setup and integration of AppSignal with Open edX, highlighting the immediate benefits of this powerful observability framework.

Introducing the Logz.io AI Agent, Accelerating the Future of Observability

Logz.io introduces its AI Agent in Beta, using GenAI to revolutionize observability. The AI Agent simplifies monitoring with automated data analysis and root cause detection, accelerating issue resolution by 3-5x for beta users—marking a critical step toward fully autonomous observability.