If you’ve ever tried to make sense of your AWS bill, you know how fast things get messy. Different accounts, hundreds of services, random tags, and suddenly, no one can say for sure who’s spending what or why the total looks so high. It’s not that teams don’t care about costs — it’s that AWS billing data isn’t always easy to interpret. Finance wants accountability. Engineering wants visibility. And somewhere between the two, ownership disappears.
If there’s one thing we’ve learned at CloudZero, it’s that success in FinOps isn’t just about having the right tools. It’s about knowing how to use them, and understanding the “why” behind every number, dimension, and dashboard.
Cloudflare-like outages can cost your business a significant amount of money. This week’s Cloudflare global outage is a wake-up call for business resilience. You can stay resilient against such outages by regularly performing resilience testing and updating your application or infrastructure configurations.
Multi-agent AI systems are trending in the software development industry right now. These systems consist of a group of individual agents that collaborate to achieve a desired goal. They mimic real world teams and departments in how they are organized. In multi-agent AI systems, each agent is assigned a task that is required to achieve a final output.
Sean Heuer and Ari Stowe break down “agent washing,” governance, and what it really means for AI to take action instead of just chatting. In this clip from Agents of IT, they share practical ways to spot the difference between chatbots, scripted automations, and true agentic systems that can plan, reason, and execute autonomously. Watch the full episode to hear their perspective on.
This blog is based on a presentation by Guillaume Moigneu at the Symfony 2024 conference. Machine learning and AI are no longer limited to Python and Node.js. PHP developers can now run AI models directly in their applications using modern tools and libraries. This guide shows you how to implement machine learning inference in PHP using ONNX and Transformers.
In this episode, Julien Simon, longtime voice in the open-source ML world, reminds us that even in the era of GenAI, reliability fundamentals haven’t changed. Julien breaks down why calling “the same model” from different providers can produce wildly different results, how deployment choices introduce hidden variability, and why reliability teams need to think of LLM systems as distributed systems.
Using AI coding tools like Cursor is fast, but it leaves a massive question: Is the new code going to break production? We solve this by combining Cursor with Proxymock! I take a live traffic snapshot of my running app, feed it back to the AI, and instantly run realistic integration tests locally. It's the only way to get true confidence before you push. Watch the full video below!