Operations | Monitoring | ITSM | DevOps | Cloud

Proactively troubleshoot with synthetic testing and distributed tracing

As your application grows in complexity, identifying the root cause of issues becomes increasingly difficult. Many monitoring strategies make this even harder by siloing frontend and backend data. To effectively troubleshoot problems that spread across your app, you need visibility not just into each part of your stack, but also into how these parts interact.

A look back at DASH 2025

DASH 2025 brought the Datadog community together like never before. During our biggest event yet, thousands of attendees gathered at the North Javits Center in New York City for two and a half days of content, learning, and community, where they deepened their knowledge and connected with peers. Here's a quick look back at some of the highlights from this year's DASH.

Kubernetes Monitoring backend 2.2: better cluster observability through new alert and recording rules

We’re excited to announce version 2.2.0 of the backend for our Kubernetes Monitoring solution in Grafana Cloud is now available. The app’s backend is supported by kubernetes-mixin, an open source Prometheus Monitoring Mixin, and this latest version features significant improvements to alert rules and recording rules that will enhance your cluster observability and monitoring experience. There’s a lot to tell you about, so let’s dive in.

IT Task Automation: Best Practices and Use Cases for IT Management with Pandora FMS

IT teams must handle a large number of tasks on a daily basis. Many of these tasks, while essential, are repetitive: resetting passwords, rebooting servers, monitoring logs for errors, applying patches… When performed manually, they can overwhelm technical staff and compromise operational efficiency. IT automation has emerged as the answer to this challenge. It involves using scripts and specialized tools to automatically execute these and other tasks that previously required human intervention.

ScienceLogic Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms

It’s official: ScienceLogic has entered the observability arena. Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms, we believe we’re helping define where observability is heading, not just where it’s been. This marks our first inclusion in this Magic Quadrant and, in our opinion, validates our mission to redefine intelligent, actionable observability in the era of AI and automation.

Uptrace v2.0: The Future of Observability is Here

The Uptrace team is thrilled to announce the release of v2.0—our biggest update yet! This release represents a complete reimagining of how observability data should be stored, queried, and managed. With multi-project support, revolutionary JSON-based storage, powerful data transformations, and a host of developer-friendly features, Uptrace v2.0 is designed to scale with your growing infrastructure needs.

Observability as Code: Why You Should You Use OaC

Key takeaways In the fast-moving world of CI/CD pipelines, microservice architectures, and container orchestration, software changes rapidly. What exists in a codebase today might be gone next week. At this scale and speed, it’s impossible for development teams to manually track every line of code and every new piece of functionality.

The Fast Path to More Useful Telemetry

Over and over, we’ve seen that teams who invest in adding rich, relevant context to their telemetry end up debugging faster and collaborating more effectively during incidents. Getting meaningful context added can feel like a big cross-team project, but some of the highest-leverage improvements don’t require app code changes or coordination across services.

Generating end-to-end tests with AI and Playwright MCP

When I started using Playwright, there was a single command that blew me away. I immediately became (and still am) a huge Playwright Codegen fanboy. Playwright's codegen command opens up a browser window, and whatever you do in this window will be recorded. Navigating URLs, clicking links, and filling out form elements—the Playwright inspector records all your actions and generates a Playwright test for you. Magic!

Optimize LangChain Performance with Trace Analytics

You’ve instrumented your LangChain app, and traces are now flowing into Last9. Now the issues are visible: API costs are crossing $200/day, average response times exceed 3 seconds, and performance degrades under 100 concurrent users. A single tool call adds over 2 seconds. Bloated context windows are pushing up token usage, wasting $50/day. Here’s how to use trace data to identify and fix these inefficiencies, systematically and at scale.