Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Proactively monitor Kerberos-authenticated web apps and APIs with Datadog Synthetics

When employee authentication fails or becomes unreliable, users can lose access to the critical systems they need. Authentication enables access to internal tools like HR applications, finance portals, and internal dashboards, so even short outages can interrupt day-to-day work, while persistent issues increase the risk of broader operational disruption.

Single-tenant vs. multi-tenant architecture with Grafana Cloud: A guide to choosing the right approach

Grafana Cloud’s flexibility is one of its greatest strengths, but the breadth of choices can sometimes be overwhelming. We see this a lot when it comes to selecting the right architectural approach, with organizations unsure of how many stacks they need to host their environment. Grafana Cloud provides robust features for managing tenancy, enabling organizations to effectively handle diverse teams and projects.

Bridging the Network Cost Gap: Why Operators Need Real-Time, Traffic-Based Cost Intelligence

Jezzibell Gilmore’s latest blog dives into the critical challenge network operators face: bridging the gap between massive traffic growth and understanding its actual cost. Learn why real-time, traffic-based cost intelligence is no longer optional for maintaining margins and driving revenue in today’s complex network landscape.

Automated BSoD (Blue Screen of Death) Monitoring and Troubleshooting

Yes, BSoDs are still cropping up in high-impact ways in 2025, from flawed Windows updates (especially 24H2 patches) to driver rollouts and heavily-threaded server environments. It remains essential for IT admins to track event reports, test updates in staging, enable rollback strategies, and be prepared with recovery mechanisms.

Monitor and optimize your systems with Uptrace

Uptrace is your single source of truth for monitoring, understanding, and optimizing complex distributed systems. Proven in production for over five years and trusted by more than a thousand installations worldwide, it lets you see your system like never before. What makes the difference is that Uptrace is pure OpenTelemetry, built natively from day one. This isn't a translation layer—it's a direct connection that eliminates friction and ensures zero vendor lock-in. Your homepage serves as your command center, providing complete visibility across your stack at a glance.

Observability Day San Francisco: The Future of AI and Observability Is Bright

AI and observability are no longer separate conversations—they’re deeply intertwined. Across keynotes, panels, and demos, speakers at Honeycomb's Observability Day San Francisco unpacked what that means for engineering teams today: faster insights, smarter tools, and new challenges to solve.

OpenTelemetry Observability: An In-Depth Look at Features and Best Practices

OpenTelemetry (OTel) is a unified framework of APIs, SDKs and tools, for collecting, processing, and exporting telemetry data (logs, metrics, and traces) across applications and infrastructure. OTel is especially required in today’s cloud-native world, where applications run on microservices, Kubernetes, and distributed systems.

Database Monitoring Challenges Every DevOps Engineer Should Know

Databases form the critical foundation of modern applications, and maintaining their performance and reliability is essential for operational efficiency and user satisfaction. Effective database monitoring however presents numerous challenges. Modern systems produce extensive metrics, operate across diverse environments, and must scale in line with growing workloads, all while ensuring compliance and security.

LLM app Observability: Opentelemetry as a standard

LLM observability is broken There are too many new libraries floating around, but they don't follow accurately the OpenTelemetry conventions. OTel isn’t perfect for LLMs yet—but extending a proven standard beats inventing another one. Why not use the same standard (OTel) which works so well for rest of the apps, and just work on top of it? This is what I was ranting with Pranav Raj S, co-founder at Chatwoot and we thought there must be other folks facing similar issues.

Internal SLAs for Third-Party Vendors: Complete Guide

Managing third-party vendors effectively requires clear expectations and measurable standards. Internal SLAs for third-party vendors provide the framework to track vendor performance, ensure compliance, and maintain service quality across your entire vendor ecosystem. This guide covers everything you need to establish and manage vendor SLAs that protect your business interests while fostering productive vendor relationships.