Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Ten Minute Troubleshooting: Meet (and Monitor) Users Where They Are

What do you do if your monitoring, APM, and synthetic tools tell you an application is up, but the users say it’s not? A good first question is to ask where your monitoring tools are located relative to both the users and the application itself. In this episode Mursi helps Leon identify his “red-light, green light” issue and adjust his monitoring to do a better job showing the REAL user’s experience.

Secure by Design: IT Modernization for Government

As government agencies modernize IT infrastructure, many are shifting to hybrid and multicloud environments. But this evolution brings heightened exposure to cyber threats. For the public sector, where data protection is tied to national security and public trust, compliance is more than a box to check—it’s the front line of defense. FedRAMP (Federal Risk and Authorization Management Program) provides a standardized framework for securing cloud services used by U.S. agencies.

Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

This was the problem one Bindplane customer had with processing enormous S3-stored log files. Our engineering team tackled the problem head-on, enhancing the S3 event receiver with offset tracking and chaos testing methodologies.

Goodput vs Throughput: The Differences and How They Affect Your Network

Two key metrics that often come up in discussions about network performance are throughput and goodput. While these terms may seem similar, they highlight different aspects of your network’s efficiency and misunderstanding them can lead to poor decision-making that can impact the way you manage your network and your business’ resources.

PostgreSQL Performance: Faster Queries and Better Throughput

A PostgreSQL setup that performed well with 10,000 users starts to show strain at 100,000. Queries that once returned in under 50ms now take over 2 seconds. The connection pool regularly hits its limit during peak usage, leading to timeouts and degraded performance. This blog focuses on practical ways to reduce query latency by 50–80% and increase throughput for high-concurrency environments.

Leaning into AI, ML, and observability to manage your ever-growing infrastructure

The complexity and scale of modern infrastructure requires an equally intelligent set of observability tools to effectively monitor it. Remember when scaling meant ordering new servers and racking them in a data center? Remember when cloud providers first offered access to seemingly infinite virtual machines at the click of a button? Remember when Kubernetes made it trivial for infrastructure to automatically scale itself based on demand?

New Feature - Vulnerable System Drivers Monitoring

Vulnerable system drivers continue to be a vector exploited by attackers to compromise systems. In eG Enterprise version 7.5 we added a number of periodic security checks to assist administrators proactively identify weaknesses, including vulnerable system drivers monitoring.This new capability is supported for a Windows OS, when using a VM agent for inside view monitoring and / or when monitoring an Azure Virtual Desktop session host.

Coralogix SLO Center & SLO Alerts are now available

Coralogix has released a new flagship service management product, the SLO Center. The SLO Center allows customers to define service level objectives (SLOs) for their teams. SLOs can be defined across multiple services or metric streams. Powered by the Coralogix Streama engine, this unlocks full coverage SLOs for every team, regardless of volume and with very high cardinality limits.

Coralogix becomes first observability vendor to earn ISO/IEC 42001:2023 certification for responsible AI

We’re proud to announce that Coralogix is now officially ISO/IEC 42001:2023 certified, becoming the first observability vendor to achieve this globally recognized standard for responsible AI management. ISO/IEC 42001:2023 is the world’s first international standard for Artificial Intelligence Management Systems (AIMS). It provides a comprehensive framework for how organizations should govern AI, focusing on transparency, ethical use, accountability, and regulatory compliance.

The Outage You Can't Afford: Why CMI/CME Providers Need Autonomous Operations Now

Imagine if degrading network performance—not just bad code—disrupted your live stream during a high-profile event. Customers start flooding support lines. Social media lights up. Your NOC teams scramble to identify the root cause amid fragmented systems. The outage impacts not only your broadcast, but also subscriber logins, ad delivery, and mobile apps. Advertisers want refunds. Executives ask, “Why didn’t we see this coming?”