Operations | Monitoring | ITSM | DevOps | Cloud

Tail sampling vs. head sampling in distributed tracing

In this video, Grafana Labs' Robin Gustafsson (CEO for K6 + VP, Product) and Sean Porter (Distinguished Engineer) discuss the differences between head sampling and tail sampling approaches in distributed tracing. They explore why head sampling often amounts to sampling randomly and hoping for the best, while tail sampling — the approach used by Adaptive Traces in Grafana Cloud — allows you to intelligently capture the traces that actually matter to you.

Valkey JSON module now available on Aiven for Valkey

The Valkey JSON module implements native JSON data type support within Valkey, allowing users to efficiently store, query, and modify complex, nested JSON data structures directly. This overcomes previous architectural complexities, such as needing to serialize entire documents as strings or flatten data into hashes, by providing native handling for nested data models.

How LinkedIn modernized its massive traffic stack with HAProxy

Connecting nearly a billion professionals is no small feat. It requires an infrastructure that puts the user experience above everything else. At LinkedIn, this principle created a massive engineering challenge: delivering a fast, consistent experience across various use cases, from the social feed to real-time messaging and enterprise tools.

Application Monitoring 101: Queue Time Can Alert Before a Breakdown

Regular monitoring practices can emphasize application response time, but queue time is also often an early and important warning sign. If it rises, you’ll quickly see downstream effects: tail latency, timeouts, and error spikes. This means that this metric can give you a head start tackling app issues before they become user problems. In this post, we’ll discuss queue time, how things can go off track, and practical steps to turn it around.

Scaling Kubernetes GitOps with Fleet: Experiment Results and Lessons Learnt

Fleet, Rancher’s built-in GitOps engine, is designed to scale up to thousands of clusters. However, “how far” can it scale in a real world scenario, you might ask? Earlier this year, we wrote about the Fleet benchmark tool and we made a few discoveries that were very instructive, especially concerning resource consumption and its impact on deployments’ performances.

Elastic at AWS re:Invent: Concluding a year of partnership in agentic AI innovation

Highlights of another laudable year of customer-centric collaboration The integration of Elastic’s capabilities, including vector databases and context engineering, with AWS services helps customers build intelligent, scalable, and secure applications faster and with greater flexibility. Our ongoing collaboration has resulted in another year of notable innovation with AWS. This blog highlights our continued collaboration with AWS throughout 2025 to help you capitalize on the power of AI.

Gartner I&O and Cloud Strategies Conference 2025: From Observability to Outcome-Driven Operations

This year’s Gartner IT Infrastructure, Operations and Cloud Strategies Conference made one thing abundantly clear: the industry is moving beyond reactive monitoring and isolated dashboards toward autonomous, outcome-driven IT operations. While AI and agentic automation dominated keynotes and vendor messaging, conversations on the show floor reflected a more grounded reality.

Confessions of a software engineer who enjoyed being paged at 5am

It’s 5:14am, and I wake up to the squawking geese sound of my PagerDuty alert (anyone else have this sound? No?). I’m four months into working for my new team as a junior software engineer, and this is my first time being paged in the middle of the night. Most software engineers probably dread this moment, but I kind of love it. Agile ceremonies and Jira tickets suddenly don’t matter, and you’re fully focussed on stopping a customer-impacting fire.

Intelligent Agents vs. Intelligent Attackers: The New Threat Detection Paradigm

Most security stacks only move when told to. They wait for known IOCs, hunt for pre-defined suspicious strings, and trigger automation only after a condition lights up. By then, attackers have already pivoted. Agentic AI rewrites the rules. Instead of signature-based detection, it monitors behavioral baselines and identity signals, watching for violations of expectations formed from observed context.