Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Key APM Metrics You Must Track

Application Performance Monitoring (APM) helps you understand how your software runs in production. When you track the right metrics, you see how requests move through your system, where slowdowns happen, and how resources are being used. With this knowledge, you can spot issues early and keep your applications reliable for your users. In this blog, we discuss the key APM metrics to monitor, grouped into categories, and why each one matters for performance and user experience.

Memory stall: the agony before OOM

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Building Real-Time Data Pipelines with Kafka, Telegraf, and InfluxDB 3

When milliseconds matter and data never stops flowing, you need a pipeline that can handle high-velocity streaming data with reliability and scale. The modern streaming stack of Kafka, Telegraf, and InfluxDB 3 Core delivers exactly that. To give you a concrete example, this blog works with a fictitious use case: “Papa Giuseppe’s Pizzeria.” Every oven, prep station, and order in this pizza restaurant generates data. Our workflow looks like this.

Beyond Automation: The Rise of Agentic Networks

Agentic AI is the next evolution in network management, moving beyond simple automation to intelligent systems that can reason, plan, and act autonomously. Justin Ryburn, Kentik Field CTO, highlights how this shift automates expertise, enables proactive problem-solving, and empowers human engineers for strategic innovation.

10 Best Practices for Proactive Database Performance Monitoring to Prevent Downtime

Databases are the core of modern applications, whether it is an e-commerce platform, a banking system, or a social media app. Slow database performance or unexpected downtime can cause serious problems, from lost revenue to poor customer experience. Proactive database performance monitoring helps teams identify issues before they escalate. Unlike reactive monitoring, which only addresses problems after they occur, proactive monitoring ensures your database remains fast, stable, and reliable.

Node.js Event Loop: Why Monitoring Matters

Node.js has become a cornerstone for modern application development because of its non-blocking and asynchronous architecture. According to Stack Overflow Developer Survey, Node.js remains among the most widely used technologies for web applications, powering millions of services globally. While this event-driven model provides scalability and efficiency, it also introduces challenges.

InfluxDB 3 Enterprise: Deploy Your Way, Scale on Demand

InfluxDB 3 Enterprise is engineered for performance and designed for flexibility, delivering high-scale, production-ready time series data management with operational simplicity. InfluxDB 3 Enterprise is built on a cloud-native, diskless architecture that removes the limits of traditional storage. It’s easy to deploy, scales effortlessly, and eliminates the complexity of managing clusters so you can deploy your way and meet the unique demands of your environment.

Automate Your Infrastructure Analysis with Scheduled AI Reports

The least exciting part of an operations or SRE role is often the manual, repetitive task of generating reports. It’s the Monday morning scramble to summarize weekly infrastructure health for the team, or the end-of-quarter push to build a capacity planning document. This is boilerplate work that pulls you away from critical engineering tasks. We believe that if a process is repeatable, it should be automated. That’s why we’re introducing Scheduled AI Investigations and Insights.