Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Lifting Equipment Operations: Safety Monitoring and IoT-Enabled Maintenance

A tower crane lifts ten tons of steel 50 meters up. A gantry crane in a shipyard moves containers weighing 40 tons. A winch pulls a vehicle onto a flatbed. These operations have one thing in common: failure is not an option. Lifting equipment operates in some of the most demanding environments on earth. Construction sites, shipyards, mines, and warehouses all depend on it. When a crane fails or a sling breaks, the results can be catastrophic. Here is how technology improves safety and uptime.

DevOps with Kubernetes: How to Reduce Cluster Toil and Complexity

Has Kubernetes made your DevOps team faster, or just busier? Most teams adopt it for speed and portability, and they get both. What arrives with it is a quieter cost: the operational weight of running the cluster day to day. That weight shows up in the manual work the platform was supposed to eliminate. A resource limit set incorrectly can waste infrastructure for months.

Unified Observability: Moving IT Teams from Reactive to Predictive

What does it take to stop an outage before it starts? In many cases, the warning signs are already there, scattered across different monitoring tools, which makes it difficult to see the full picture before issues escalate. When an incident occurs, engineers often spend valuable time piecing together metrics, logs, traces, and alerts to determine the root cause. Every minute spent investigating extends the outage and increases its business impact.

Observability for LLM Apps and Agents: OpenLIT SDK + VictoriaMetrics observability stack

Many “LLM observability with OpenTelemetry” tutorials stop at a single chat.completions span. That works for a demo, but it leaves gaps once an agent fans out into 30 tool calls, two vector-DB queries, three handoffs, and a 90-second tail latency you need to attribute. This post wires the OpenLIT SDK (50+ instrumentations, OTel GenAI semantic conventions, one line of code) into the full VictoriaMetrics observability stack and shows query examples that turn agent telemetry into decisions.

Icinga Web 2.14, Security Releases, and Module Updates

We are shipping a new batch of Icinga Web ecosystem releases today. Icinga Web 2.14 is the headline, bringing the baseline for two-factor authentication support, configurable password policies, a configurable Content Security Policy, and a round of developer tooling improvements that have been in the works for a while. Icinga Certificate Monitoring 1.4, Icinga Reporting 1.1, and Icinga PDF Export 0.13 join it with PHP 8.5 support across the board and a set of focused improvements for each module.

June 2026 Early Warning Signals

June 2026 saw major outages across ecommerce, AI, developer tools, and business applications. StatusGator’s Early Warning Signals surfaced many of these incidents before providers updated their official status pages. Of the 1,067 incidents detected by StatusGator in June, only 191 (17.9%) were eventually acknowledged by providers.

Introducing relationships for Service Monitors

Understanding a service outage is easier when you can see what it’s connected to. That’s why we’re introducing Relationships for Service Monitors, one of the most requested features from StatusGator’s hundreds of enterprise IT teams. You can now explore related services directly from the Service Details page by opening the Relationships dropdown.

Autoscaling Checkly Private Location Agents in Kubernetes with KEDA

Monitoring load is not always steady. A team might add a new batch of checks or run several ad hoc tests during a rollout. When that happens, your Private Location agents need to pick up more work at once. If there aren’t enough agents available during a burst, checks start piling up in the queue, which can delay or disrupt check execution. But solving this by running a high number of agents around the clock has the opposite problem: most of that capacity sits idle until the next busy period.

ITSM Maturity Playbook Live, Episode 2 | The CMDB is Your Map

Join this 5-part series designed to help IT teams move from reactive, fragmented processes to a more structured, connected way of working. Each session focuses on a core area, from incident resolution and CMDB visibility to employee experience, service catalog design, and change governance, giving you practical frameworks you can apply right away. You’ll walk away with: Faster, more consistent incident resolution.