Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Take control of your OpenTelemetry Collectors with Otel Remote Management

Managing OpenTelemetry (OTel) collectors across diverse, cloud-native environments is key to streamlining monitoring and gathering valuable insights. But, managing them effectively, especially across multiple servers, has been a manual and time-consuming process. That changes today. Sumo Logic’s Otel Remote Management is designed to simplify OpenTelemetry Collector management, all from a single unified user interface.

Session Replay for Mobile is now Generally Available: See What Your Users See

Session Replay for Mobile is now generally available. I could bombard you with hyperbolic statements about why Session Replay is worth using, but instead, A…I… wrote you a haiku: Screen freeze, devs all sigh. Replay uncovers the crime: Forgot.addListener.

Grafana SLO: Easily predict the likelihood that you'll hit your target

Service-level objectives (SLOs) can be a great way to ensure you’re hitting your goals, but many software teams struggle to set realistic targets when they first set up the service-level indicators (SLIs) that underpin those efforts. Sometimes management has a decree that all services will operate with “three 9s” of availability; other times engineers pick a number out of thin air.

What is Adaptive Telemetry, and how can it reduce MTTR, noise, and cost?

As your applications scale, so too does the flood of logs, metrics, profiles, and traces—along with the costs to store and manage them. Collecting everything might feel like the safest bet, but it often leaves you buried in noise and struggling to find the signals that matter, all while costs spiral out of control.

Why pharmaceutical manufacturing can't afford IT failures: A real-world CDMO case study

The margin for error is extremely low in a critical sector like manufacturing, where accuracy, efficiency, and time to delivery are indispensable. These aspects become even more crucial in pharmaceutical manufacturing, a critical sector that is always in high demand, especially following the COVID-19 pandemic. The responsibility now falls on contract development and manufacturing organizations (CDMOs) that partner with pharmaceutical and biotech companies.

The SRE Report 2025: Highlighting Critical Trends in Site Reliability Engineering

Catchpoint's annual report reveals the rise of operational toil, the growing importance of user experience as a reliability metric, and the challenges of balancing speed and stability in a rapidly developing AI-driven landscape.

Metric Watch - a real-time view of past, present, and future of metrics

Enterprise operations monitor various metrics associated with the stability, performance, availability, and other such aspects of business, application, and IT infrastructure. These could be business KPIs such as footfall, checkout time, and sales of the flagship stores. These could be performance metrics such as the response time of business-critical applications. These could be the queue length or enqueue rate of the backbone message queues.

WhiteScreen.VIP: The Perfect Companion for Monitor Testing and Maintenance

Feeling overwhelmed when it comes to finding a dead pixel or uneven brightness on the monitor? Not everybody is here! Many users encounter this issue, which is somewhat common, and may prove to be a troublesome problem, causing reduced productivity and creativity. There’s a very effective remedy: A clean white backdrop.

Accelerate root cause analysis with Watchdog and Faulty Kubernetes Deployment

Understanding and managing the impact of Kubernetes changes is one of the biggest challenges for modern DevOps teams. Every modification to a manifest, whether it’s adjusting memory limits, tweaking CPU allocations, or updating container images, has the potential to destabilize services or degrade performance.