Operations | Monitoring | ITSM | DevOps | Cloud

Detect and map third-party outages with Datadog External Provider Status

Modern applications depend on dozens of external cloud platforms, APIs, and SaaS services to function. But when those providers experience issues, engineers often spend valuable time asking a basic question: Is the problem with us or with them? Provider-maintained status pages are often slow to update, leaving teams waiting for confirmation while incidents escalate. This delay wastes valuable time, prolongs investigations, and risks customer trust.

Optimize HPC jobs and cluster utilization with Datadog

High-performance computing (HPC) environments support some of the most critical workloads in the world—from asset pricing models in financial institutions to molecular simulations in drug discovery. These workloads often span hundreds of thousands of cores, depend on specialized infrastructure such as GPUs, and run for extended periods. As a result, performance and efficiency are critical.

Introducing Updog.ai: Real-time provider status from Datadog

When external SaaS providers or cloud services degrade or go down, engineers often find themselves wondering if the issue they're encountering is local or more widespread. The answers they find are usually slow to surface, limited in detail, or entirely dependent on the provider's updates. Vendor-controlled status pages and third-party aggregators don’t provide the timely, independent visibility that's necessary to quickly and accurately identify the root cause of slowdowns.

What Is an Email Blacklist?

An email blacklist is a database that lists IP addresses or domains suspected of sending spam or malicious emails. Mail servers use these lists to decide whether to deliver or reject incoming messages. Understanding how blacklists work is essential for keeping your messages deliverable and your domain reputation intact.

AI-Powered Translation Tools: A Hidden Asset for Scaling DevOps Globally

DevOps or development (Dev) and IT operations (Ops) teams are no longer confined to single geographic locations or language groups. With over 80% of organizations now practicing DevOps (a figure projected to reach 94% in the near future), the challenge of scaling operations globally has never been more critical. Yet, one persistent bottleneck continues to slow down even the most sophisticated DevOps workflows: language barriers.

AWS Outage: How do you prepare for the failure of your own safety net?

When AWS’s massive outage struck, it didn’t just take down cloud services, apps, and enterprise platforms. It also knocked out many of the monitoring systems organizations depend on for real-time answers. Observability companies, including Datadog, New Relic, Checkly, Dynatrace, SpeedCurve, and Splunk Observability, lost visibility or functionality precisely when organizations needed them most.

Unreal Engine crash reporting now available on gaming consoles with trace-connected logs

With the first major release of the Sentry Unreal SDK (now on v1.2.0, and you can also explore in our interactive sandbox), we’ve made some important improvements to support cross-platform Unreal developers when it comes to platform coverage, debugging with user feedback, and performance monitoring improvements. Here’s what’s new.

Making logs work smarter: Evolving your observability strategy

When you start building an observability stack, it’s natural to reach for logs first. They’re familiar, easy to generate, and often already part of a developer’s workflow. And sending logs to a centralized system feels like a quick win, too. Simply add a log shipper, and voila, your application is observable.