Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Service Reliability Engineering and related technologies.

Ubuntu Crash Logs: Find, Fix, and Prevent System Failures

If your system keeps crashing and you have no clue why, Ubuntu’s crash logs might have the answers. Whether you’re running a production server or just trying to keep your personal setup stable, these logs tell you exactly what went wrong. Instead of sifting through endless system logs, Ubuntu gives you focused crash reports—kind of like a security camera that only records when something breaks. Let’s break down where to find these logs and how to make sense of them.

Observability Pipeline: An Easy-to-Follow Guide for Engineers

You've got systems spitting out more logs, metrics, and traces than you can handle. Your monitoring costs are through the roof. And somehow, when something breaks at 3 AM, you still can't find the exact data you need. Sound familiar? Welcome to the observability pipeline conversation—no jargon, no fluff.

Zero Code Instrumentation: The Missing Link in Observability

Have you ever struggled with systems that fail to tell you what went wrong? The kind where you’re digging through logs at 2 AM while alerts keep piling up. In DevOps, clear visibility into your applications isn’t a luxury—it’s essential. This is where instrumentation without code changes can help. It simplifies observability, reducing the manual effort needed to track down issues. If you haven’t explored it yet, you might be making troubleshooting harder than it needs to be.

How Motive achieves 99.99% reliability with Rootly

In the high-stakes world of fleet management, reliability isn’t a nice-to-have—it’s a necessity. That’s why Motive has invested heavily in tools and processes to ensure its systems run smoothly for over 150,000 customers and more than a million vehicles. At the center of its ability to deliver 99.99% uptime at scale is Rootly.

Are AI and Platforms Making SRE Obsolete? With Kaspar von Grünberg, Humanitec's CEO

Last year, over 89% of companies claimed to have adopted platform engineering. And, in the past month, LLMs have been disrupting how we think about software development. In this context, Kaspar, asks if the role of Site Reliability Engineers is being obsolete as we know it. Kaspar argues that while SREs aren’t going anywhere, their responsibilities are evolving—fast. We talk about.

Your Observability Questions, Answered

Monitoring used to be simple—set up some dashboards, configure alerts, and call it a day. But with microservices and cloud-native systems, things aren’t so straightforward anymore. Keeping track of everything can feel like an endless game of whack-a-mole. That’s where observability comes in. If you’re just getting started or looking to refine your approach, this guide answers the most common (and important) questions.