Operations | Monitoring | ITSM | DevOps | Cloud

Amazon Isn't Eating Its Own DNS Dog Food

On October 19-20, 2025, Amazon Web Services (AWS) experienced a significant outage (AWS status) affecting its US-EAST-1 region in northern Virginia. The root cause was DNS resolution failures for DynamoDB’s API endpoints, which cascaded across AWS’s interconnected services, disrupting major platforms including Snapchat, McDonald’s, Disney+, Roblox, Coinbas, Reddit, and Amazon’s own services.

Build Vs. Buy? Why Creating Your Own Cost Management Platform Is Futile

The siren song of building a custom, internal cloud cost management platform is enticing. Many brilliant engineering teams are convinced they can come up with a bespoke solution that perfectly fits their needs. They look at their company’s unique infrastructure and decide they can DIY cost management without having to rely on an external vendor. Believe me, I get the temptation.

4 Everyday IT Headaches You Can Eliminate with Enterprise IT Automation

Every IT operator anywhere on the team ladder dreads this feeling: another day, another flood of service desk tickets. Like cockroaches, they come in waves and they’re repetitive. Worse still, they distract your teams from higher-value work. Ironically for the amount of disruption they can cause, most of these tickets are not complex incidents or novel challenges. They’re the same everyday IT headaches your enterprise has been dealing with for years.

The Hidden Risk of DNS - Lessons from the AWS Outage & Why You Need DNS Spy Monitoring NOW

On October 20, 2025, much of the internet came to a halt. Apps wouldn’t load. Payments failed. Cloud dashboards went dark. From Fortnite to Alexa, Snapchat, and countless business platforms, users across the world were suddenly offline — all because DNS broke inside Amazon Web Services’ (AWS) US-East-1 region.

SOC vs. the Clock: The New Cybersecurity Frontlines

Cloud attacks now account for over half of all threats — and most businesses still aren’t ready. In this conversation, Scott from N-able and Zac from First Technology Group unpack the latest SOC threat intelligence, the rise of AI in cyber defence, and why layered security is more critical than ever. What you’ll learn: If you manage IT, security, or risk, this is your insider’s view into what’s coming — and how to prepare.

Building Intelligent Search: A Tutorial on Aiven for OpenSearch and Vertex AI

Aiven for OpenSearch is a fully-managed service that provides an ideal way to run OpenSearch on Google Cloud. It is designed for companies looking to operate search applications without taking on the burden and complexity of self-managing the infrastructure in the cloud. Running on Google Cloud, the service is built upon core infrastructure like Google Compute Engine, Google Cloud Storage, and Private Service Connect.

Detect and map third-party outages with Datadog External Provider Status

Modern applications depend on dozens of external cloud platforms, APIs, and SaaS services to function. But when those providers experience issues, engineers often spend valuable time asking a basic question: Is the problem with us or with them? Provider-maintained status pages are often slow to update, leaving teams waiting for confirmation while incidents escalate. This delay wastes valuable time, prolongs investigations, and risks customer trust.