I Tested MIG in Real-Life Azure - Did It Feel Like a Stuffed Cubicle?
I carved one Azure H100 into virtual “cubicles” using MIG (Multi-Instance GPU), compared it to an A100, ran Triton inference workloads, and captured both latency and cost. The verdict – The H100 with MIG delivers better latency and consistency, while the A100 is more cost-effective at scale, depending on your workload.