Operations | Monitoring | ITSM | DevOps | Cloud

When agents orchestrate agents, who's watching?

You used to monitor services. Then you started monitoring AI calls inside services. Now your AI agent is spinning up other AI agents to complete tasks. Your old monitoring instincts need to evolve. This isn't hypothetical. Agentic architectures are already in production. Coding agents are calling search agents; orchestrators are spawning specialized sub-agents for retrieval, planning, and execution. Teams are shipping these systems faster than they're figuring out how to watch them.

Observability Focus: Why It Became the Default Language of Modern IT Operations

Digital services run on fragile highways of microservices, containers, and event streams. Outages no longer hide inside a single server rack; they ripple across regions and ruin brand trust in minutes. Because uninterrupted insight now decides whether a launch soars or stalls, engineers treat observability as the vocabulary for every architectural choice, deployment ritual, and post-incident review. Similar discipline emerges in studios that refine professional end-to-end game dev workflows, where frame drops and lag spikes receive the same diagnostic rigor expected of banking APIs.

AWS Outage History: What Engineering Teams Should Learn

If you've been running production workloads on AWS for more than a year, you've felt it: the 3 am PagerDuty alert, the scramble to check the AWS console, the frantic Slack thread asking, "Is this us or is this AWS?" And then, minutes or hours later, the AWS Service Health Dashboard finally acknowledges what your users have been experiencing all along. It happens because AWS is the backbone of modern infrastructure.

DataPrime at Ingest: Fine-Grained TCO Routing with DPXL

The real economic decision for observability happens at ingest, before storage, billing, and retention choices are locked-in. Until now, the logic governing that decision could only see three broad fields: application, subsystem, and severity. That just changed. TCO routing now matches on any field in the event payload, including nested keys, custom fields, and event body content, using DPXL, the DataPrime Expression Language.

The New Economics of Enterprise AI: Why Small Models Win Where It Matters

For years, progress in AI was equated with scale. Larger models, broader parameter counts, and increasingly complex cloud architectures were treated as signals of advancement. In enterprise operations, however, scale alone does not determine success. Economics does. As AI becomes embedded in operational workflows, organizations are discovering that model size is less important than cost stability under continuous load. AI-driven operations do not run in bursts. They run constantly.

Join operator and Query Agent for smarter log analysis

Sumo Logic’s log analytics capabilities have always provided the greatest insights to help you secure, monitor and troubleshoot your environment. Now, with our Query Agent, as part of Dojo AI, creating optimized log searches with natural language is even easier. Query Agent works with a wide variety of operators, including the join operator, for parsing, aggregation, data transformation, filtering, advanced analysis and lookup.

How to Use Time Series Autoregression (With Examples)

Time series autoregression is a powerful statistical technique that uses past values of a variable to predict its future values. This approach is particularly valuable for forecasting applications where historical patterns can inform future trends. In this hands-on tutorial, you’ll learn how to implement autoregressive (AR) models using Python and see how InfluxDB can enhance your time series analysis workflow.

Why Alert Fatigue Is Killing Your MTTR

Every minute counts when production systems go down. Yet the average enterprise NOC team receives over 1,000 alerts per day, according to a 2025 study by OpsRamp. Of those, fewer than 5% require human intervention. The rest? They are noise — redundant, low-priority, or symptomatic signals that bury the genuine incidents demanding immediate attention.

Why Enterprise AI Demands More Than Just Automation

Based on insights from The Intelligent Enterprise podcast, “The Evolution from Automation to Autonomy” Every couple of weeks, The Intelligent Enterprise podcast steps away from the day-to-day noise of enterprise life to explore big ideas from a fresh perspective. In one recent episode, the focus turned to a question many organizations are still grappling with: What does it really take to build an AI-powered enterprise that works with people, not against them?