Operations | Monitoring | ITSM | DevOps | Cloud

AI ROI is an allocation problem

AI spend is going parabolic, and the labels on the bill (OpenAI, Anthropic, Gemini) are about all a CXO gets to work with. The hard part of tying that spend to outcomes is structural. A major portion of AI spend isn’t COGS. It’s the spend on coding agents producing the software, the spend on building marketing content, the spend on custom sales tooling, the spend on Intercom agents and Sybill analysis.

A deep dive into AWS data perimeter misconfigurations

In AWS environments, a data perimeter is a set of preventative controls that help ensure that your trusted cloud identities (principals or AWS services acting on your behalf) are accessing trusted resources from authorized networks. You can apply these controls at various levels of your infrastructure, such as per resource or across all resources in your AWS account.

How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

Spark jobs only get more expensive and harder to debug as they scale. It’s a problem we’ve run into ourselves. Our Referential Data Platform team builds and maintains the knowledge graph that maps relationships between customers’ observability entities. ServiceQueryEdge is at the center of that graph, mapping service entities to their associated metric and log queries.

Migrate to Azure Managed Redis with Datadog and Eden

Azure Managed Redis is a Microsoft first-party, fully managed in-memory data store, replacing Azure Cache for Redis tiers. It includes Redis Enterprise features such as RediSearch for vector search and full-text search, in addition to RedisJSON, RedisTimeSeries, and Active Geo-Replication. As Azure Cache for Redis reaches end of life, more teams are planning migrations to Azure Managed Redis in search of better performance, lower cost, and modern capabilities for AI and real-time workloads.

Scaling Your App

Every application starts the same way: One server. One database. One optimistic engineer saying: “We’ll scale later.” And honestly? That’s usually the right call. Premature scaling is how perfectly normal applications end up with: But eventually, growth happens. Traffic increases. Queries slow down. Deployments get riskier. Your infrastructure starts making unfamiliar noises. This is where scaling enters the picture. Not scaling for conference talks.

What a Forrester TEI study on Edwin AI actually tells IT leaders-and how to use it

This blog helps IT leaders use the Forrester Consulting TEI study as a practical framework for evaluating Edwin AI in their own environments. A Total Economic Impact study is useful for one, critical reason: it takes a broad technology claim and turns it into a financial and operational framework. That matters in AI for IT operations because the market is crowded with claims. Every platform says it reduces noise. Every platform says it improves efficiency. Every platform says it helps teams move faster.

How Canonical Support solves hard Linux performance bugs - even in 12-year old code

Some support cases are straightforward. Others lead deep into legacy code, where a single logic bug can quietly turn a routine command into a major performance problem. This series looks at how Canonical Support and Sustaining Engineering work together to investigate, patch, and upstream difficult issues that standard troubleshooting alone cannot solve.

Built by ServiceNow, Extended by iOPEX: The Outcome-driven Co-Delivery Model for Agentic Transformation

ServiceNow's internal IT operations now resolve more than 90% of employee requests through autonomous agents. The platform that demonstrated this in Las Vegas earlier this month is the same platform sitting in your environment right now. So why isn't your operation running the same way? This is the question every CIO should have walked out of Knowledge 2026 with.