Operations | Monitoring | ITSM | DevOps | Cloud

Ep 18: AI has a memory problem, just like you do

In this episode of Masters of Data, we dive into how AI learns, examining both how we teach it and what it derives from human performance, as well as why context plays a crucial role in AI interactions. We break down five key components of AI training and talk about why we should view AI as a tool under human control rather than an autonomous entity. We explore the challenge of maintaining context in AI—much like our own memory struggles—and discuss methods, such as retrieval-augmented generation, that can help AI retain context more effectively.

Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

With Datadog GPU Monitoring, engineering and ML teams can monitor GPU fleet health across cloud, on-prem, and GPU-as-a-Service platforms like Coreweave and Lambda Labs. Real-time insights into allocation, utilization, and failure patterns make it easy to spot bottlenecks, eliminate idle GPU spend, and resolve provisioning gaps. By tying usage metrics directly to cost and surfacing hardware and networking issues impacting performance, Datadog helps teams make fast, cost-efficient decisions to keep AI workloads running reliably at scale.

Unlocking Full Application Visibility with LogicMonitor

In today’s digital landscape, application performance isn’t just about monitoring several key apps and “keeping the lights on,” it’s about understanding the full breadth of your interconnected business services and ensuring you’re delivering seamless, reliable experiences to customers and teams alike. But as applications grow increasingly distributed across cloud, on-prem, and hybrid environments, monitoring them holistically can become a serious challenge.

Tame multi-cluster chaos. A Platform Engineer's guide to distributed Kubewarden Policies with Fleet

For platform engineers managing multiple Kubernetes clusters, maintaining policy consistency is a constant struggle. Manually applying security rules across a growing fleet of clusters is inefficient and error-prone. This approach creates significant risks: As your environment scales, this operational burden becomes unsustainable. Each out-of-sync policy represents a potential security gap, increasing the cluster’s attack surface.

Understanding Microsoft's Latest Zero-Day Vulnerability #patch

Microsoft has identified a zero-day vulnerability in the Windows kernel, rated 7.0 on the CVSS scale. Organizations often misprioritize vulnerabilities, focusing on high-severity scores instead of those actively exploited. Many lower-severity vulnerabilities are chained by attackers. A risk-based approach to vulnerability management is crucial. Current Windows OS versions are affected, and organizations should consider Extended Security Updates to mitigate risks as more zero days are anticipated.

Agentic AI: Ushering in the Next Era of Intelligent IT

IDC predicts agentic AI will command over 26% of global IT spend, hitting $1.3 trillion in 2029. How do IT Ops teams prepare for the reality of agentic systems being embedded across workflows, interfaces, and enterprise platforms? We went straight to the source—IT Ops leaders—to learn how they’re tackling agentic AI.

How to Monitor .NET Applications on Linux with SolarWinds Observability | Step-by-Step Setup

This video provides a step-by-step walkthrough for configuring monitoring for.NET applications running on Linux using SolarWinds Observability. The demonstration covers the full setup process—from adding a new service to verifying the APM library connection. Topics covered in this video include: This guide is intended for developers, system administrators, and DevOps engineers who need to quickly and reliably instrument.NET applications on Linux for performance monitoring and observability.

How to Achieve Deep Network Visibility with SolarWinds Observability SaaS

Looking for a faster way to discover every device on your network? This video walks through how SolarWinds Observability automatically scans and classifies network gear—including routers, switches, access points, firewalls, and SD-WAN devices—in seconds. You’ll learn how to: This is the easiest way to get full network visibility without scripts, config files, or manual inventory work.