Operations | Monitoring | ITSM | DevOps | Cloud

Commit Without the Lock-In: The Multi-Cloud Savings Playbook

Cloud commitments don’t fail because the math is wrong. They fail when teams chase flexibility instead of designing for it. In this short blog, I will walk you through a multi-cloud strategy (AWS and Azure) that uses Savings Plans, Reservations, and Marketplace Reserved Instances, without getting locked in.

How to build a multi-region Azure platform in 13 hours using Agentic InfraOps

A mid-sized e-commerce company needed a production-ready, multi-region web platform deployed on Azure, and they needed it fast. Their infra team was at capacity, and the deadline was two weeks. Using Agentic InfraOps, I handed the job to AI agents: from requirements gathering to Bicep deployment and validation. The result: fully documented, compliant infrastructure built in 13 hours, with 92% alignment to the Azure Well-Architected Framework and 85% time savings.

Claude Models Just Landed in Azure: I Opened a Terminal and Tested

Microsoft just added Anthropic’s Claude models to Microsoft Foundry. Instead of reading the press release, I ran a mini-benchmark to see how Claude Opus, Sonnet, and Haiku actually perform with real Python tasks and stdin/stdout workflows. The results will surprise you. Opus was the most complete, Haiku the fastest (and chattiest), and instruction-following was the weak point across the board. If you’re thinking about using Claude in production, read this first.

I Benchmarked EBS vs. S3 Express So You Don't Have To - Here's What I Found

I benchmarked EBS and S3 Express One Zone across EC2 and EKS setups to settle a question we kept hearing from customers: “What’s the best temp storage for ML pipelines?” The short answer: Here’s the full story and why it matters to your ML pipelines.

Are You Missing the Easiest Azure Discount in Your Stack?

If you’re using Microsoft Defender for Cloud, you’re probably overpaying. There’s a commitment-based pricing model that can save you up to 22% annually. But Azure won’t recommend it, and third-party tools ignore it. This blog breaks down how Defender Commit Units (DCUs) work, why they’re a blind spot, and what you need to do about it.

Simplifying GPU Workloads On-Prem? Here's What Actually Worked for Me

Let’s be honest. A lot of AWS customers are still running on-prem GPU servers. Sometimes it’s for internal model training jobs, sometimes it’s cost-sensitive work that doesn’t need cloud-scale reliability. The pattern is common, especially in R&D-heavy environments. The usual go-to is virtualization platforms. But those add complexity and licensing overhead most teams would love to ditch. So, I went looking for something cleaner.

I Tested MIG in Real-Life Azure - Did It Feel Like a Stuffed Cubicle?

I carved one Azure H100 into virtual “cubicles” using MIG (Multi-Instance GPU), compared it to an A100, ran Triton inference workloads, and captured both latency and cost. The verdict – The H100 with MIG delivers better latency and consistency, while the A100 is more cost-effective at scale, depending on your workload.

What Happened When I Put the Promise of MCP to the Test of Real Life?

Everyone is talking about Model Context Protocol (MCP). The promise – let models talk directly to systems instead of building endless glue code. I wanted to see how it can solve several existing challenges, both internally and for our customers. So, I ran two real-world projects: Both solutions moved beyond theory; they’re running in production today.

I Saved 40% on Azure VMs Just by Using This One Setting

TL;DR If you’re still buying Azure reservations that match your exact VM size, you’re missing out. There’s an option called Size Flexibility that lets a single reservation apply to multiple VM sizes in the same family. In some cases, reserving the cheapest VM size in the group still fully covers the more expensive ones, saving you up to 40%.