Operations | Monitoring | ITSM | DevOps | Cloud

Smarter AI Cost Optimization With Guardrails That Scale

AI adoption is reshaping how organizations innovate. It’s also driving cloud costs higher. CloudZero’s State Of AI Costs In 2025 report finds that for mature FinOps and engineering leaders, visibility into AI costs is a critical first step, but it’s not enough. To enable fast, responsible AI and machine learning innovation at scale, teams need pragmatic, flexible guardrails. They don’t need rigid budgets or knee-jerk shutdowns that slow progress or push teams into shadow ML.

Agentic AI Becomes Essential: Why Adoption Is Accelerating and What Comes Next

The cautious optimism business leaders held towards AI agents has evolved into more widespread enthusiasm. In our last survey from April 2025, just over half (51%) of companies had deployed AI agents in their organization. Six months later, 75% of companies are deploying more than one agent, according to PagerDuty’s latest research.

Sentry AI code review, now in beta: break production less

This could’ve been prevented. This should have been prevented. This too. We all hate getting tagged in PRs. The time, the blame for when you inevitably miss something, and constant “I wouldn’t have written it that way” feeling is just hard to shake. LLMs promised this would get easier. Promised they would do it for us. But as we’ve seen, we’re not there yet. But this is what Sentry does for a living. We catch bugs… in prod.

Key APM Metrics You Must Track

Application Performance Monitoring (APM) helps you understand how your software runs in production. When you track the right metrics, you see how requests move through your system, where slowdowns happen, and how resources are being used. With this knowledge, you can spot issues early and keep your applications reliable for your users. In this blog, we discuss the key APM metrics to monitor, grouped into categories, and why each one matters for performance and user experience.

AI Hype vs. IT Reality: What to Expect from an IT Automation Platform That Actually Delivers

Artificial intelligence is everywhere: in keynotes, press releases, product announcements, and quarterly roadmaps. From help desk chatbots to predictive analytics, AI is being positioned as the silver bullet for almost every IT challenge. But under the surface, most IT leaders are asking a much more practical question: What’s actually real?

Memory stall: the agony before OOM

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Goodbye Email-to-Text: Why Modern Mobile Alerting with SIGNL4 Is the Smarter Choice

Over the past year, major U.S. mobile carriers have shut down their free email-to-SMS and email-to-text services – once common ways to send a text message directly from an email account. AT&T terminated its SMS gateway service in mid-2025, Verizon discontinued its SMS gateway domain in late 2024, and T-Mobile retired its gateway domain in December 2024.

Cortex is now available in the Devin Marketplace, keeping your AI within the guardrails of your org wide best practices

We are thrilled to announce that the Cortex Model Context Protocol (MCP) is now available in the Devin marketplace. This integration connects the world’s first AI software engineer with the real-time context of your entire engineering ecosystem, as managed and measured by Cortex. The rise of AI software engineers like Devin fundamentally changes how organizations tackle their biggest technical challenges.

Building Real-Time Data Pipelines with Kafka, Telegraf, and InfluxDB 3

When milliseconds matter and data never stops flowing, you need a pipeline that can handle high-velocity streaming data with reliability and scale. The modern streaming stack of Kafka, Telegraf, and InfluxDB 3 Core delivers exactly that. To give you a concrete example, this blog works with a fictitious use case: “Papa Giuseppe’s Pizzeria.” Every oven, prep station, and order in this pizza restaurant generates data. Our workflow looks like this.