Operations | Monitoring | ITSM | DevOps | Cloud

Fix an error in Copilot without leaving your IDE

Production errors are every developer's nightmare. You're enjoying your coffee when suddenly alerts start firing - users are experiencing crashes, and you need to find and fix the issue fast. In this video, we'll walk you through how to use AI to diagnose and fix critical errors in an application using Rollbar's MCP (Model Context Protocol) server.

Reliability lessons from the 2025 AWS DynamoDB outage

On October 19th and 20th, 2025, the AWS region US-EAST-1 suffered a massive outage. What started with a 3-hour Amazon DynamoDB outage from a DNS issue led to an Amazon EC2 outage that lasted an additional 12 hours before normal service was restored. Over the course of the outage, there were over 17 million outage reports as companies like Snapchat, Roblox, Amazon, Reddit, Venmo, and more were impacted.

200 EPISODE ANNIVERSARY SPECIAL! PEDRO BADOS RETURNS!

It’s a milestone moment! Our 200th episode of The DEX Show (and the first ever late one — sorry about that!). To mark the occasion, Nexthink Founder and CEO Pedro Bados returns to reflect on Nexthink’s incredible journey and discuss the company’s next era — from the recent investment by Vista Equity Partners to the accelerating fusion of DEX and AI. Pedro shares his perspective on how AI is reshaping the workplace, Nexthink’s vision for “an IT agent for every employee,” and why he’s optimistic about the future of technology and innovation. A landmark conversation to celebrate our big birthday.

AI Agents Observability with OpenTelemetry and the VictoriaMetrics Stack

Nowadays, AI agents are becoming more and more popular and often deployed as part of production systems. However, this rapid adoption brings unique observability challenges that require flexible solutions. On the one hand, AI agents are fundamentally just like any other software services that produce the same classic observability signals we’re familiar with: metrics, logs, and traces.

Reimagining Network and Security Operations: How AI and Automation Are Transforming the Modern NOC & SOC

In today’s hyper-connected, always-on enterprise landscape, every second of downtime, every unnoticed anomaly, and every delayed response can have cascading business implications. Traditional Network and Security Operations Centers (NOCs and SOCs), built on manual triage and siloed data, were never designed for this pace or scale.

Streamline Incident Management with the New Netdata-ServiceNow Integration

When a critical alert fires at 2 AM, the last thing your on-call engineer should be doing is manual administrative work. Yet, for many teams, that’s exactly what happens. You see the alert in your monitoring tool, then you have to switch contexts, open a new browser tab, log into your ITSM platform, and manually create an incident—all while your systems are failing.

When AI Thinks and Humans Act: The Future of Operational Resilience

Artificial Intelligence has become the sharpest tool in the digital arsenal – detecting anomalies, predicting failures, and uncovering risks before they unfold. Yet even the smartest system can’t roll up its sleeves and fix what’s broken. AI can see the problem. But only people can solve it. That’s the critical gap in today’s automation revolution: turning AI’s insight into human action.

Show Me the AI: Rethinking How AI Fits Into Network Operations

Over the last couple of years, nearly every network and infrastructure observability platform has added the word “AI” to its messaging. Some have introduced helpful capabilities. Others have simply added a chatbot on top of the same dashboards that have existed for a decade. In many ways, the term has started to lose meaning. But inside network operations, the conversation hasn’t disappeared. It has simply become more blunt.

What Is BigQuery? A Guide To How It Works And Costs

Data has exploded — and so have the challenges that come with it. Every click, transaction, and sensor ping generates mountains of data that traditional databases can’t handle. That’s why more than 94% of organizations now rely on cloud platforms, according to CloudZero’s 2025 cloud report. The goal isn’t just to store data, but rather, to make sense of it fast. And this is exactly where tools such as Google BigQuery step in.