Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

VDI Monitoring: How to Ensure High-Performance Virtual Desktop Infrastructure

Remote and hybrid work turned virtual desktops from a niche IT choice into a core way employees get their jobs done. When a desktop lives in the data center or the cloud, every logon, click, and screen refresh depends on infrastructure the user never sees. That shift is why VDI monitoring matters: it protects the end-user experience when the desktop is no longer local. The challenge is that a single slow session can have dozens of causes—across compute, storage, network, and the broker layer.

The Journey to Achieving Hyperscale Availability with AI-Driven Prediction

At hyperscale, a regional cloud outage is not merely a technical disruption—for Samsung Account, which serves 2.1 billion users across three global regions, it is an immediate global service crisis. Fragmented, region-siloed monitoring creates blind spots that make early detection nearly impossible, leaving SRE teams perpetually reactive rather than predictive. The path to proactive reliability requires both a philosophical shift and a foundational change in how observability data is collected, unified, and reasoned over.

The AI Engineering Playbook: How to Evaluate & Iterate at Every Phase of Development

AI coding tools are accelerating development velocity, creating a release challenge most teams aren’t equipped for. Without controlled rollout, higher change velocity makes it harder to know which specific release drove the results you’re seeing in production. And when teams use AI, to build AI – LLM apps and AI agents– complexity multiplies. Traditional observability can’t ensure AI agent quality, performance, and cost-efficiency at production scale.

From Legacy to AI-Ops: Securing and Scaling Systems for 20M Device Requests with Datadog

Modernizing a legacy system serving 20 million devices without users noticing is like replacing a jet engine mid-flight. In this session, YoungJin Jung and Donggen Hong from LG U+ share their 18-month journey transforming a Telco-scale API Gateway from a rigid, proprietary solution into a high-performance, open-source architecture on AWS, and the operational challenges they solved along the way.

Ship Reliable AI Faster: How to Operate AI Agents with Control and Confidence

Replace "AI shipped on hope" with an operating model that holds up once real users depend on it. AI quality is multi-dimensional, covering accuracy, tone, safety, and faithfulness to user data, and can't be debugged from outputs alone. Without visibility into what their AI actually did in production, teams miss regressions, reverse-engineer chains by hand, and watch a single bad answer erode trust built over hundreds of right ones.

What is AIOps? Benefits, Use Cases, and How It Transforms IT Operations

Decades ago, IT operations was relatively simple, with a few components such as client, server, network, and the static environments. IT teams relied on manual analysis to manage these systems. Over time, however, IT operations has evolved significantly, driving the adoption of AIOps technologies.

Full Stack Observability vs Monitoring: Key Differences

Traditional monitoring tracks system health by collecting data such as metrics and logs, this data is checked to see if a system is behaving as expected and alerts are raised if errors or anomalous data values are found. This works well in stable, predictable environments, but modern IT systems are far more complex and dynamic. In distributed architectures like microservices and cloud-native platforms, predefined alerts usually aren’t enough to explain why a failure is happening.

How Coding Agents are Changing the Traditional Software Development Lifecycle

AI coding assistants are rapidly evolving from passive copilots into active, agentic collaborators capable of planning, executing, and iterating on complex software tasks. This shift has huge ramifications onthe software development lifecycle (SDLC), developer productivity, and even the structure of engineering teams.

Fireside Chat with Datadog CPO Yanbing Li and Vercel CPO Tom Occhino

The way we build, ship, and run software is being reshaped by AI. In this fireside chat, Yanbing Li (CPO, Datadog) and Tom Occhino (CPO, Vercel) will discuss their perspectives on the impact AI is having across the industry and what it means for teams navigating this shift today.