%term

The latest News and Information on Service Reliability Engineering and related technologies.

Which Observability Tool Helps with Visibility Without Overspend

Dec 5, 2025 By Anjali Udasi In Last9

If you’re trying to control observability spend without cutting visibility, the platforms that usually offer the best cost balance at enterprise scale are Last9, Grafana Cloud, Elastic, and Chronosphere — depending on the shape of your telemetry and the level of operational ownership you want.

Read Post

Last9

Read more about Which Observability Tool Helps with Visibility Without Overspend

Bits AI SRE, our first AI agent, now generally available! #datadog

Dec 4, 2025 By Datadog In Datadog

We introduced Bits AI SRE, our first AI agent, now generally available. Across industries, customers of all sizes are already seeing faster resolution, stronger reliability, and a better on-call experience for their teams.

View Video

Datadog

Read more about Bits AI SRE, our first AI agent, now generally available! #datadog

OTel Updates: Unroll Processor Now in Collector Contrib

Dec 4, 2025 By Anjali Udasi In Last9

Some log sources bundle multiple events into a single record before shipping them. This is common with VPC flow logs, CloudWatch exports, and certain Windows endpoint collectors. While this batching approach is efficient for transport, it creates challenges when you need to filter, search, or correlate individual events. When a log record contains an array of 47 events, your analytics tool sees one entry instead of 47 distinct records.

Read Post

Last9

Read more about OTel Updates: Unroll Processor Now in Collector Contrib

Cost Optimization Is Now Part of the SRE Playbook

Dec 4, 2025 By Itiel Shwartz In Komodor

In the era of cloud-native architectures, Site Reliability Engineering (SRE) has matured from a discipline focused purely on uptime to a sophisticated practice of efficient reliability. The key driver for this evolution is an undeniable truth: cloud spend has become intrinsically linked to system stability.

Read Post

Komodor

Read more about Cost Optimization Is Now Part of the SRE Playbook

Gemini 3 beaks OpenAI's long-standing lead in SRE tasks

Dec 4, 2025 By Rootly In Rootly

A major shift just hit SRE-focused AI. Gemini 3 Pro edged out OpenAI’s models and outperformed them across every single SRE task we tested. In this Rootly AI Labs episode, Sylvain Kalache and Laurence Liang break down.

View Video

Rootly

Read more about Gemini 3 beaks OpenAI's long-standing lead in SRE tasks

The hidden costs of immature incident management #sre #devops

Dec 3, 2025 By Rootly In Rootly

Learn more: https://rootly.com/blog/the-hidden-costs-of-immature-incident-management

View Video

Rootly

Read more about The hidden costs of immature incident management #sre #devops

Datadog Bits AI SRE: Your new teammate for on-call shifts

Dec 2, 2025 By Datadog In Datadog

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

View Video

Datadog

Read more about Datadog Bits AI SRE: Your new teammate for on-call shifts

9 Monitoring Tools That Deliver AI-Native Anomaly Detection

Dec 1, 2025 By Anjali Udasi In Last9

The observability market has moved beyond manual threshold-setting. Modern platforms use statistical algorithms, machine learning, and causal AI to detect anomalies automatically. Some work immediately after deployment. Others train on your data for better accuracy. Each approach has technical trade-offs worth understanding. This guide compares how nine monitoring solutions handle automated anomaly detection and root cause analysis.

Read Post

Last9

Read more about 9 Monitoring Tools That Deliver AI-Native Anomaly Detection

Instrument Jenkins With OpenTelemetry

Nov 27, 2025 By Anjali Udasi In Last9

You can instrument Jenkins with OpenTelemetry using the official plugin and an OpenTelemetry Collector, then send the data to a backend like Last9 to understand where pipeline latency and failures actually originate. Jenkins provides job status and console logs, but it doesn't show how time is distributed across stages, agents, plugins, and external systems. OpenTelemetry fills that gap by emitting traces, metrics, and logs in a standard format that any OTLP-compatible backend can process.

Read Post

Last9

Read more about Instrument Jenkins With OpenTelemetry

Pastries with SREs: Holding onto extra observability data and desserts

Nov 26, 2025 By Elastic In Elastic

In this episode of Pastries with SREs, we dig into why you should keep all of your observability data, even if you don’t need it quite yet. We explore: With enriched logs and flexible, cost-effective storage, you can stop worrying about what you might need later and start answering questions with confidence, no matter when they arise. Additional resources.

View Video