Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Improving the Developer Experience by Monitoring Third-Party Outages

The role of third-party SaaS and cloud services in the modern software development stack needs no explanation. Primarily due to the ease of setting up and hooking them together, they make the software development lifecycle (SDLC) much easier than it was 10 years ago. No more managing the overhead of installing, configuring, maintaining, backing up, and scaling of source code repos, virtual machines, and CI/CD systems. Some services don't have any in-house options, e.g. payment gateways.

Kafka Performance Crisis: How We Scaled OpenTelemetry Log Ingestion by 150%

When your telemetry pipeline starts falling behind, the countdown to production impact has already begun. One Bindplane customer operating a large-scale log ingestion pipeline built on the OpenTelemetry Collector and Kafka hit that breaking point. Instead of keeping pace with incoming data, their pipeline was ingesting just 12,000 events per second (EPS) per partition/collector—and this Kafka topic had 16 partitions. In aggregate, that was roughly 192K EPS.

Part Two - Event Intelligence vs. AIOps: Key Differences, When to Use Each and Why

The IT environments of large enterprises have become so complex that operational teams have turned to two solution categories in particular to help them improve visibility and gain faster incident response, automate and enable more effective decision-making.

Pioneering DEX Agents and Benchmarks

At Nexthink, our focus is Digital Employee Experience (DEX), it’s all we do, and all we aim to be the very best at. Today, we have a unique opportunity to deliver the world’s most advanced DEX models and agents, fine-tuned and trained specifically on real DEX use cases from our thousands of users. This matters because, in our vision, most IT operations will eventually be fully automated by AI and technology.

Major Opportunities and Technologies in Business HVAC Operation

The backbone of comfort, energy efficiency, and indoor air quality of buildings depends on commercial HVAC systems. Efficient environmental conditions in office buildings, manufacturing plants, and much more are crucial to the functionality of such systems. Yet, commercial HVAC operations have their challenges as well, and a new wave of technologies is enabling operators to meet them.

How to Build a Strategic Roadmap for Site Reliability Engineering Implementation

Getting your site reliability engineering solutions in place can seriously boost how your systems perform. But implementing site reliability engineering (SRE) isn't a simple flip of a switch-it's a process. If you want to keep your systems running smoothly, with minimal downtime and top-notch performance, you need a solid, strategic plan. This roadmap should guide you step-by-step, from setting clear goals to constantly improving your processes.

What is SNMP (Simple Network Management Protocol)?

The Simple Network Management Protocol (SNMP) sure does pack a punch for something with “simple” in its name, as it literally provides the lifeblood of network monitoring and device communications. Network admins rely heavily on SNMP because nearly every technology manufacturer supports the protocol. And, in turn, it enables them to collect information, configure devices and receive alerts about network performance and issues.

What is the User Lifecycle & How Can IT Teams Manage It?

It’s Monday morning, and a new hire is walking into the office for their first day. Before they can dive into the work, they need access to email, project management tools, cloud storage, and a dozen other SaaS apps their role depends on. IT has already been hard at work behind the scenes, provisioning accounts, assigning permissions, and making sure everything is ready the moment they sign in.

The Service Discovery Problem Every Developer Knows (But Pretends Doesn't Exist)

Launch Week Day 1: Introducing Discover Services Picture this: It's 2 AM, alerts are firing, and you're staring at a dashboard trying to figure out which service is causing the cascade of failures. Your service map is a six-month-old Miro board, and you have no idea what's actually talking to what in production right now. If you've been there, you're not alone. In fast-moving teams, new services get deployed faster than you can track them.