Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Cloud monitoring, security and related technologies.

Kubernetes For AI: The CTO's Guide

Kubernetes began as a tool to help teams keep thousands of microservices running without falling apart. It gave them a way to schedule workloads, recover from failures, and scale services without constant firefighting. Now, AI has brought back the same chaos, only magnified. Training jobs sprawl across GPUs. Inference traffic spikes without warning. Pipelines stretch across clusters, clouds, and compliance boundaries. Left unchecked, it can break both your workload and cloud budget.

Service disruption on October 20, 2025

When the internet goes down, our primary job is to help everyone get back up, as fast as possible. Of the almost half a million incidents we've helped our customers solve, there are some which stand out for both their scale and impact. One of these happened on Monday, October 20, when AWS had a widely covered major outage in their us-east-1 region, from 07:11 to 10:53 UTC. We’re hosted in multiple regions of Google Cloud and so the majority of our product was unaffected by the outage.

The Best Cloud Storage Deals of Black Friday 2025

Looking for the best cloud storage deals? You’re in the right place, and since Black Friday is just around the corner, now is the perfect time. This time of year, companies offer their biggest deals on everything from tech gadgets, beauty, video games, and much more. But for cloud storage, we’ve got you covered with the best cloud storage deals of the year, allowing you to store, backup, sync, and share your files with friends, family, or colleagues.

Introducing Updog.ai: Real-time provider status from Datadog

When external SaaS providers or cloud services degrade or go down, engineers often find themselves wondering if the issue they're encountering is local or more widespread. The answers they find are usually slow to surface, limited in detail, or entirely dependent on the provider's updates. Vendor-controlled status pages and third-party aggregators don’t provide the timely, independent visibility that's necessary to quickly and accurately identify the root cause of slowdowns.

SOC vs. the Clock: The New Cybersecurity Frontlines

Cloud attacks now account for over half of all threats — and most businesses still aren’t ready. In this conversation, Scott from N-able and Zac from First Technology Group unpack the latest SOC threat intelligence, the rise of AI in cyber defence, and why layered security is more critical than ever. What you’ll learn: If you manage IT, security, or risk, this is your insider’s view into what’s coming — and how to prepare.

The Hidden Risk of DNS - Lessons from the AWS Outage & Why You Need DNS Spy Monitoring NOW

On October 20, 2025, much of the internet came to a halt. Apps wouldn’t load. Payments failed. Cloud dashboards went dark. From Fortnite to Alexa, Snapchat, and countless business platforms, users across the world were suddenly offline — all because DNS broke inside Amazon Web Services’ (AWS) US-East-1 region.

Build Vs. Buy? Why Creating Your Own Cost Management Platform Is Futile

The siren song of building a custom, internal cloud cost management platform is enticing. Many brilliant engineering teams are convinced they can come up with a bespoke solution that perfectly fits their needs. They look at their company’s unique infrastructure and decide they can DIY cost management without having to rely on an external vendor. Believe me, I get the temptation.

Amazon Isn't Eating Its Own DNS Dog Food

On October 19-20, 2025, Amazon Web Services (AWS) experienced a significant outage (AWS status) affecting its US-EAST-1 region in northern Virginia. The root cause was DNS resolution failures for DynamoDB’s API endpoints, which cascaded across AWS’s interconnected services, disrupting major platforms including Snapchat, McDonald’s, Disney+, Roblox, Coinbas, Reddit, and Amazon’s own services.

Sustainable Cloud Computing in the UK: Challenges, Opportunities, and the Future

The tech industry's environmental impact is a growing concern, but can collaboration and innovation drive sustainability? At Civo Navigate London 2025, Regent Lee, Dinesh Majrekar, Liam McTague, and Simon Morris explored the challenges and opportunities of reducing emissions in the tech industry.

AWS Outage: How do you prepare for the failure of your own safety net?

When AWS’s massive outage struck, it didn’t just take down cloud services, apps, and enterprise platforms. It also knocked out many of the monitoring systems organizations depend on for real-time answers. Observability companies, including Datadog, New Relic, Checkly, Dynatrace, SpeedCurve, and Splunk Observability, lost visibility or functionality precisely when organizations needed them most.