Operations | Monitoring | ITSM | DevOps | Cloud

How Datadog Feature Flags is resilient to cloud provider failures

As major incidents like AWS’s October 2025 outage illustrate, modern systems are immensely interconnected. A failure in one can lead to a cascade of downstream problems. In this case, issues with DNS resolution for DynamoDB led to widespread disruptions with other AWS services and, subsequently, thousands of applications and services that rely on that infrastructure.

What is AWS Fargate for Amazon ECS?

As cloud applications moved from VMs to containers and then to microservices, the amount of background work needed to keep everything running grew just as quickly. You gain speed and flexibility, but you also end up managing clusters, scaling rules, and capacity choices that don’t really add to the product you’re building. AWS Fargate steps in right there. It lets you run your ECS tasks without looking after any servers at all.

OTel Updates: Complex Attributes Now Supported Across All Signals

OpenTelemetry now supports maps, heterogeneous arrays, and byte arrays across all signals. Here’s where these new types shine — and where simple primitives still fit naturally. If you’ve been working with OpenTelemetry for a while, you’re likely familiar with the straightforward key-value approach to attributes. It’s simple, fast, and works well with how most telemetry backends store, index, and query data.

The database professional of the future: headlines from Redgate's Keynote at PASS Data Community Summit 2025

Redgate took the main stage earlier today to open PASS Data Community Summit with our keynote, where we shared our vision for the future of the database development experience – one driven by speed, safety, and the intelligent use of AI. As data estates grow in scale and complexity, and as organizations push to deliver software faster than ever, the role of the database is undergoing significant change.

Navigating External Outages: How Selector Cuts Through the Cloudflare Noise

Yesterday’s widespread Cloudflare outage reminds us how crucial external dependencies are to the stability of our own applications. When a key edge provider like Cloudflare goes down, the impact on your internal monitoring systems can look like a catastrophic, internal system failure triggering a massive storm of alerts and sending engineering teams into frantic, misdirected debugging sessions.

From Data Lake to Lakehouse. Why Cribl is Preparing for the Agentic #ai Era #telemetry

Customers asked for a simpler way to store and access telemetry data, and Cribl delivered. First came Cribl Lake. Cost effective data storage, flexible access, and identity based authorization instead of infrastructure based access rules. A simple way to retain data at rest and run slow, inexpensive analytics when needed. But the story did not end there.

Why Gaining Control of Your #telemetry Data Is a Game Changer

Disconnected pipelines. Unknown data sources. Costs that do not add up. Many teams struggle to answer a simple question. What data do we have and where is it going? In this clip, a Cribl customer explains how bringing all telemetry data together changed everything. With Cribl, their team can finally see what they collect, where it flows, and what it costs. That clarity unlocked smarter reduction, better routing decisions, and major optimization across security and observability workflows.

AI wrote the code, but can you trust it? #aicoding #integration #cursor #devops #speedscale

Using AI coding tools like Cursor is fast, but it leaves a massive question: Is the new code going to break production? We solve this by combining Cursor with Proxymock! I take a live traffic snapshot of my running app, feed it back to the AI, and instantly run realistic integration tests locally. It's the only way to get true confidence before you push. Watch the full video below!