Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Captur: Observability-First Mobile ML Inference for Better Customer Confidence

Captur builds a mobile SDK that brings real-time image recognition and actionable feedback directly into customers’ apps, running complex machine learning models entirely on device without cloud inference. This architecture delivers privacy and performance, but also creates unique challenges when it comes to observability and debugging, especially as crashes can originate from the host app rather than the SDK itself.

Taming the Broker Network: Achieving Reliable Apache ActiveMQ Operations

Broker networks grow from success but often become fragile webs. A global retailer's journey from Apache ActiveMQ chaos to reliable operations shows how unified visibility, automation, and governed self-service transform messaging from liability to strategic asset.

A New Scale Tier for Amazon Timestream for InfluxDB

InfluxDB 3 on Amazon Timestream for InfluxDB now scales to 15-node clusters, unlocking higher ingestion, greater query concurrency, and real-time performance at scale. In this video, PM Pete Barnett breaks down what this means for high-resolution, high-velocity workloads, and how you can scale from Core to Enterprise with zero downtime or data migration.

Cost Optimization in Action: How We Cut Amazon SQS Costs by 87%

JC, the Director of Software Engineering, Cloud at LogicMonitor, shares how Cost Optimization enabled his team to shift to Cost-Intelligent Observability and tackle an unexpected and growing cloud bill. As engineers, we live and breathe performance. We obsess over latency, reliability, and uptime, the hallmarks of a healthy system. But there’s another metric that’s becoming just as critical: cost.

Monitor your application and network load balancer logs

Load balancers are the primary entry points to distributed applications. By strategically directing the flow of incoming web traffic to specific endpoints, load balancers help optimize throughput and ensure the horizontal scalability of applications. In modern systems, load balancers often do more than their name suggests: Beyond basic load distribution, they analyze requests and route traffic based on a wide range of variables, such as client identity.

Event Intelligence for Agentic IT Operations

Modern IT teams are experimenting with AI agents. But individual agents, working in isolation are not enough. To truly achieve Agentic IT Operations, organisations need a platform — one that coordinates, governs, and contextualises AI-driven actions across the entire IT landscape. That’s where Interlink Software comes in.

Instrumenting Rust TLS with eBPF

Coroot is an open source observability tool that uses eBPF to collect telemetry directly from applications and infrastructure. One of the things it does is capture L7 traffic from TLS connections without any code changes, by hooking into TLS libraries and syscalls. Works great for OpenSSL. Works for Go. Then rustls enters the picture and everything stops being obvious. With OpenSSL, everything is nicely wrapped: From eBPF’s point of view this is perfect: Everything happens inside one call.

Shifting Metrics Right

In the shift left era where it feels like we’re pushing everything as far to the start of the SDLC as we can, it may seem counterintuitive to shift anything right. That is, however, exactly what I suggest when it comes to generating metrics. How far you go to the right of the SDLC is a much more nuanced question and is dependent on a lot of factors, and on what metrics you’re talking about.

The Hidden Crisis in Modern IT: Interpretation Risk

Technology leaders spent the past decade investing heavily in visibility. They expanded monitoring footprints, adopted cloud-native observability tools, integrated analytics dashboards, and layered on automation intended to streamline detection. Every addition promised deeper insight. Every initiative aimed to bring clarity to increasingly complex environments. Yet operations feel more chaotic, not less. Outages move faster. Incidents cross more boundaries. Signals appear without context.