Operations | Monitoring | ITSM | DevOps | Cloud

From Zero to Open Source Contributor

Never contributed to open source and feeling intimidated? Same. Before joining Datadog, Alessandro had zero open source experience. Now he's a regular contributor to Apache Iceberg. Here's exactly how he got started. Step 1: Join the Slack community and answer user questions. Step 2: Look for "good first issue" tags in the repo. Step 3: Remember that opening bug reports and doing code reviews count as contributions too.

Building a Code Review system that uses prod data to predict bugs

This post takes a closer look at how Sentry’s AI Code Review actually works. As part of Seer, Sentry’s AI debugger, it uses Sentry context to accurately predict bugs. It runs automatically or on-demand, pointing out issues and suggesting fixes before you ship. We know AI tools can be noisy, so this system focuses on finding real bugs in your actual changes—not spamming you with false positives and unhelpful style tips.

Closing the Year: What 2025 Taught Us About Resilience

By Doreen Jacobi, DERDACK / SIGNL4 It is that time of the year again. Time to reflect and look back at 2025. And I find myself thinking less about platforms and features – and more about the people behind them. The engineers who pick up the phone at 2 a.m. The operators who make judgment calls with incomplete information. The responders who keep systems running when everything feels urgent. If this year taught us anything, it’s this: technology can detect the problem, but people solve it.

Logging Best Practices (Grafana OpenTelemetry Community Call)

We’re back with a new Grafana OpenTelemetry Community Call episode, and this time we’re diving into logging with OpenTelemetry and Grafana Loki! Even better, we’re joined by two fantastic guests: Jack Berg, OTel logging expert, and Ed Welch, Loki guru. Getting both of them in one conversation makes for an amazing deep-dive into all things logging. Logs come in every shape and size, from simple CLI output to massive distributed systems generating petabytes of structured data. In this episode, we’ll talk about.

Tail sampling vs. head sampling in distributed tracing

In this video, Grafana Labs' Robin Gustafsson (CEO for K6 + VP, Product) and Sean Porter (Distinguished Engineer) discuss the differences between head sampling and tail sampling approaches in distributed tracing. They explore why head sampling often amounts to sampling randomly and hoping for the best, while tail sampling — the approach used by Adaptive Traces in Grafana Cloud — allows you to intelligently capture the traces that actually matter to you.

Valkey JSON module now available on Aiven for Valkey

The Valkey JSON module implements native JSON data type support within Valkey, allowing users to efficiently store, query, and modify complex, nested JSON data structures directly. This overcomes previous architectural complexities, such as needing to serialize entire documents as strings or flatten data into hashes, by providing native handling for nested data models.

How LinkedIn modernized its massive traffic stack with HAProxy

Connecting nearly a billion professionals is no small feat. It requires an infrastructure that puts the user experience above everything else. At LinkedIn, this principle created a massive engineering challenge: delivering a fast, consistent experience across various use cases, from the social feed to real-time messaging and enterprise tools.

Application Monitoring 101: Queue Time Can Alert Before a Breakdown

Regular monitoring practices can emphasize application response time, but queue time is also often an early and important warning sign. If it rises, you’ll quickly see downstream effects: tail latency, timeouts, and error spikes. This means that this metric can give you a head start tackling app issues before they become user problems. In this post, we’ll discuss queue time, how things can go off track, and practical steps to turn it around.

Scaling Kubernetes GitOps with Fleet: Experiment Results and Lessons Learnt

Fleet, Rancher’s built-in GitOps engine, is designed to scale up to thousands of clusters. However, “how far” can it scale in a real world scenario, you might ask? Earlier this year, we wrote about the Fleet benchmark tool and we made a few discoveries that were very instructive, especially concerning resource consumption and its impact on deployments’ performances.