Operations | Monitoring | ITSM | DevOps | Cloud

Pastries with SREs: Enriched logs and filled donuts

In this episode of Pastries and SREs, we take a sweet dive into one of the most exciting evolutions in observability: enriched logs, also known as wide events. Gone are the days of toggling between tools and stitching together logs, metrics, and traces. Enriched logs consolidate the context, providing everything you need to understand and resolve issues in a single log entry. We explore.

Free cloud credits: Why your architecture gets lazy and bloated

This is the uncomfortable truth about cloud credits: Short-term savings mask crippling long-term costs. Taken from our recent webinar, Civo CCO Simon Hansford and Canopy Founder James Marks expose the primary concerns of the credit model. Credits act as a dangerous incentive for architectural laziness. When cost isn't a factor, you stop designing for efficiency, leading to bloated, inefficient infrastructure and the inevitable bill shock.

How to Monitor Java Applications on Windows with SolarWinds Observability | APM Setup Guide

This video provides a step-by-step walkthrough for configuring monitoring for Java applications running on Windows using SolarWinds Observability. The demonstration covers the complete process—from adding a new service to instrumenting the application with the Java APM library and verifying connectivity. Topics covered in this video include: This guide is designed for developers, DevOps engineers, and system administrators who need to instrument Java applications on Windows for performance monitoring, distributed tracing, and full-stack observability.

Understanding Kafka with Speedscale #speedscale #kafka #visualization #engineering #production

In this video, we're breaking down the complex world of Apache Kafka and showing you how to gain deep visibility into your event streaming architecture using Speedscale. Kafka is the backbone of modern, cloud-native systems, but understanding what's happening in production—which topics are receiving traffic, where messages are going, and how services are interacting can be a real challenge. We'll cover how Speedscale makes Kafka visualization and debugging simple by.

How Much Did OpenAI's 30,000 CPU Core Optimization Save Them?

I admit I was a little skeptical going into KubeCon 2025. The last time I went, in 2022, it felt tactical. I heard lots of conversations around small solutions to small problems. Practical knowledge-sharing is of course beneficial, but I’m most inspired by the big picture — ideally, a picture bigger than you can see anywhere outside of your mind. I’m heartened to say that KubeCon 2025 was exactly that.

5 Reasons to Switch to the Calico Ingress Gateway (and How to Migrate Smoothly)

The Ingress NGINX Controller is approaching retirement, which has pushed many teams to evaluate their long-term ingress strategy. The familiar Ingress resource has served well, but it comes with clear limits: annotations that differ by vendor, limited extensibility, and few options for separating operator and developer responsibilities. The Gateway API addresses these challenges with a more expressive, standardized, and portable model for service networking.

Introducing Bits AI SRE, your AI on-call teammate

Bits AI SRE is your AI on-call teammate, built to autonomously investigate alerts and coordinate incident response. Integrated with Datadog, Slack, GitHub, Confluence, and more, Bits analyzes telemetry, reads documentation, and reviews recent deployments to determine the root cause of alerts—often before you’ve even opened your laptop. In fact, if you're using Datadog On-Call, you can view Bits’s findings right from your phone—so you’re always one step ahead, no matter where you are.