Operations | Monitoring | ITSM | DevOps | Cloud

Better together: Cribl and Microsoft Fabric just got radically simpler

In September, I wrote about how Cribl and Microsoft Fabric Real-Time Intelligence provide a powerful combination, unlocking new analytics capabilities for security and IT teams. I also said there was more to come… Today, Cribl is thrilled to announce a new Cribl Destination for Microsoft Fabric Real-Time Intelligence, marking another big step forward in our collaboration with Microsoft to make it much easier for Cribl customers to use Fabric.

How to Monitor RabbitMQ

A queue quietly fills up overnight. Memory hits the configured watermark and RabbitMQ blocks all publishers. Your entire message pipeline freezes, and you discover the problem when users start complaining. This scenario repeats across thousands of production systems because teams don't monitor RabbitMQ properly. The broker exposes comprehensive metrics, but most engineers don't know which ones predict failures or how to track them.

Ep 18: AI has a memory problem, just like you do

In this episode of Masters of Data, we dive into how AI learns, examining both how we teach it and what it derives from human performance, as well as why context plays a crucial role in AI interactions. We break down five key components of AI training and talk about why we should view AI as a tool under human control rather than an autonomous entity. We explore the challenge of maintaining context in AI—much like our own memory struggles—and discuss methods, such as retrieval-augmented generation, that can help AI retain context more effectively.

Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

With Datadog GPU Monitoring, engineering and ML teams can monitor GPU fleet health across cloud, on-prem, and GPU-as-a-Service platforms like Coreweave and Lambda Labs. Real-time insights into allocation, utilization, and failure patterns make it easy to spot bottlenecks, eliminate idle GPU spend, and resolve provisioning gaps. By tying usage metrics directly to cost and surfacing hardware and networking issues impacting performance, Datadog helps teams make fast, cost-efficient decisions to keep AI workloads running reliably at scale.

Unlocking Full Application Visibility with LogicMonitor

In today’s digital landscape, application performance isn’t just about monitoring several key apps and “keeping the lights on,” it’s about understanding the full breadth of your interconnected business services and ensuring you’re delivering seamless, reliable experiences to customers and teams alike. But as applications grow increasingly distributed across cloud, on-prem, and hybrid environments, monitoring them holistically can become a serious challenge.

Tame multi-cluster chaos. A Platform Engineer's guide to distributed Kubewarden Policies with Fleet

For platform engineers managing multiple Kubernetes clusters, maintaining policy consistency is a constant struggle. Manually applying security rules across a growing fleet of clusters is inefficient and error-prone. This approach creates significant risks: As your environment scales, this operational burden becomes unsustainable. Each out-of-sync policy represents a potential security gap, increasing the cluster’s attack surface.

Understanding Microsoft's Latest Zero-Day Vulnerability #patch

Microsoft has identified a zero-day vulnerability in the Windows kernel, rated 7.0 on the CVSS scale. Organizations often misprioritize vulnerabilities, focusing on high-severity scores instead of those actively exploited. Many lower-severity vulnerabilities are chained by attackers. A risk-based approach to vulnerability management is crucial. Current Windows OS versions are affected, and organizations should consider Extended Security Updates to mitigate risks as more zero days are anticipated.

Agentic AI: Ushering in the Next Era of Intelligent IT

IDC predicts agentic AI will command over 26% of global IT spend, hitting $1.3 trillion in 2029. How do IT Ops teams prepare for the reality of agentic systems being embedded across workflows, interfaces, and enterprise platforms? We went straight to the source—IT Ops leaders—to learn how they’re tackling agentic AI.