Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How to Prioritize Critical Resources with Grafana SLO-driven IRM | ObservabilityCON on the Road 2024

New to Service Level Objectives (SLOs) and Service Level Indicators (SLIs)? Or curious how Grafana makes it easy to prioritize critical resources with SLO-driven Incident Response Management? In this recording, Marc and Mimi walk through a demo of Grafana SLO. See for yourself how Grafana SLO keeps your engineers in one location to ease collaboration and workflow automation during an incident response.

Explore, Beyla, Asserts, Loki 3.0, AI/ML: ObservabilityCON on the Road Keynote 2024 | Grafana

In this talk, RichiH (Office of the CTO) discusses the latest updates on our announcements from our flagship ObservabilityCON event in London 2023, including Explore Metrics, Explore Logs, Beyla, Asserts, Loki 3.0. Plus, learn how we're leveraging AI/ML to reduce a little bit of that toil in your observability practice. This talk includes a demo of Explore Logs and Asserts.

Internal vs External APIs - What is the Difference?

APIs are an important part of modern software development, allowing communication between different systems and services. However, not all APIs are the same. Internal APIs and external APIs have different purposes and characteristics that affect their management and security needs. In this article, we will look at the main differences between internal and external APIs, focusing on their definitions, purposes, advantages, and disadvantages.

The Cost Crisis in Metrics Tooling

In my February 2024 piece The Cost Crisis in Observability Tooling, I explained why the cost of tools built atop the three pillars of metrics, logs, and traces—observability 1.0 tooling—is not only soaring at a rate many times higher than your traffic increases, but has also become radically disconnected from the value those tools can deliver. Too often, as costs go up, the value you derive from these tools declines.

Centralized Logging with Open Source Tools - OpenTelemetry and SigNoz

Modern-day software systems emit millions of log lines per minute. Cloud computing and containerization have made it easy to have distributed systems. Distributed systems emit logs from multiple sources. While developers have always used logs to debug stand-alone applications, centralized logging solves the challenges of modern-day distributed software systems.

Kubernetes Monitoring - What to Monitor, Tools and Best Practices

Kubernetes has since emerged as “THE” container orchestration platform for deploying and managing containerized workloads as a result of its robust capabilities. However, the complexity of its architecture and its dynamic nature present significant challenges in monitoring deployed workloads and the platform itself. Kubernetes monitoring is crucial for maintaining the health, performance, and reliability of containerized applications.

Scaling Runtime Diagnosis System w/ Grafana Pyroscope | Roblox at ObservabilityCON on the Road 2024

In this video, Xiaofeng and Jialin from Roblox introduce their journey in building a robust runtime diagnostic system using Pyroscope. With over 70 million daily active users and 4.4 million creators contributing to the platform, ensuring reliability and efficiency is paramount. They discuss the challenges faced in debugging production issues and the manual, inefficient methods previously used. Through thorough investigation and collaboration with Grafana Labs, they developed an on-demand profiling workflow, enabling engineers to identify and address performance bottlenecks effectively.