Operations | Monitoring | ITSM | DevOps | Cloud

The Roblox Outage

Just before Halloween 2021, Roblox engineers experienced a horror story: a service outage that also took down critical monitoring systems. It seemed like the issue was a hardware problem, but it wasn’t. Users were frustrated, and the clock was ticking. After three full days of downtime, service was finally restored on Halloween day. While the incident itself was an IT nightmare, Roblox’s detailed technical post-mortem several months later was an excellent way to bounce back.

Monitoring your Network SNMP devices using Hosted Graphite

When you design architecture to monitor your digital assets - either software applications or hardware devices, you need to use different strategies depending on your monitoring target. The factors you want to consider can vary including methods of retrieving monitoring data, frequency of data collection, and how you want to surface metrics and insight you find to stakeholders. In this article, we will mainly discuss how we can monitor your network SNMP devices using Hosted Graphite.

Welcome to InfluxDB IOx: InfluxData's New Storage Engine

Two years ago I announced that InfluxData was working on a new core for InfluxDB, a project we named InfluxDB IOx. InfluxDB IOx is a cloud-native, real-time, columnar database optimized for time series data built in Rust on top of Apache Arrow and DataFusion. Today I’m excited to announce that we deployed our next-generation storage engine that’s built on InfluxDB IOx in our InfluxDB Cloud platform.

Managing the hidden costs of cloud networking - Part 2

In the first post of this series, I detailed ways companies considering cloud adoption can achieve quick wins in performance and cost savings. While these benefits of the cloud certainly remain true in theory, realizing these benefits in practice can be increasingly difficult as applications and their networks become more complex.

New Honeycomb Features Raise the Bar for What Observability Should Do for You

As long as humans have written software, we’ve needed to understand why our expectations (the logic we thought we wrote) don’t match reality (the logic being executed). To that end, we developed techniques to help measure reality—logging text strings, or capturing aggregated metrics—and persevered, seeking out newer and fancier logging or monitoring solutions over the intervening decades.

Optimize Java Application Performance by Monitoring JVM Metrics

Although Java has been around for 27 years, enterprise applications still favor it as one of their preferred platforms. Java's functionality and programming flexibility increased concurrently with technological advancement, keeping it a useful language for more than 25 years. Outstanding examples of this progression include new garbage collection algorithms and memory management systems.