Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Top 6 Reasons Why You Need a Status Page Aggregator

Your business depends on the reliability of the third-party services you use. Monitoring the status pages of these services is the best way of keeping track of their outages and maintenances. Although some status pages let you subscribe to alerts, there is no standard way of doing this. Service providers can change their status page providers, disable subscriptions, or not support the same notification options.

Why Intelligent Traffic Steering is Critical for Performance and Cost Optimization

In today’s world of globally distributed applications, user experience is everything. Whether your platform runs across multiple cloud providers or uses a Multi CDN with numerous points of presence (PoPs), efficiently routing user traffic can make or break performance. That's where intelligent traffic steering becomes not just a nice-to-have, but a must-have.

The Rise of Shadow AI & the Tech Debt Tsunami

Recently, Logz.io co-founder and CTO Asaf Yigal teamed up with DevOps legend John Willis for an engaging webinar exploring the exciting—and occasionally intimidating—world of Shadow AI and the “tech debt tsunami” on the horizon. This lively session dove into how generative AI (GenAI) is reshaping software development, DevOps practices, and infrastructure management, along with some friendly advice on how organizations can navigate these changes without getting swept away.

How SNMP traps help prevent network failures: A use case analysis

You're likely well aware of how damaging network downtime can be to an enterprise's revenue, reputation, and overall operational efficiency. But what if you could spot potential issues before they turn into major problems? That's how Simple Network Management Protocol (SNMP) traps help enterprises stay ahead of failures and keep networks running smoothly. SNMP traps are an essential tool for network observability in enterprises looking to maximize uptime, optimize costs, and enhance resilience.

Optimizing Kubernetes node resources: How to avoid exhaustion and improve performance

Resource exhaustion at a node remains a critical issue. However, the automation of deployment and management of containerized applications is executed relatively efficiently in Kubernetes. When a node is low on resources—as in CPU, memory, or storage—a workload may suffer from failures, degraded performance, and eviction.

Optimizing SQL (and DataFrames) in DataFusion: Part 1

Sometimes Query Optimizers are seen as a sort of black magic, “the most challenging problem in computer science,” according to Father Pavlo, or some behind-the-scenes player. We believe this perception is because: However, Query Optimizers are no more complicated in theory or practice than other parts of a database system, as we will argue in a series of posts: Part 1: Part 2: After reading these blogs, we hope people will use DataFusion to.

Troubleshoot microservice-based apps faster with Splunk Observability Cloud

When something goes wrong with your microservice-based apps, Splunk Observability Cloud offers a unified Observability platform to make debugging processes easier and faster. By using features like the Service Map to identify the cause of the error and Related Logs in Log Observer to pinpoint its location, you can get back up and running quickly, limiting the impact to your bottom line and keeping your customers happy.

Introduction to Private Locations in Splunk Synthetic Monitoring

In this tutorial, we’ll demonstrate how to create and use private locations in Splunk Synthetic Monitoring to test internal or pre-production applications within a Kubernetes environment. You'll learn exactly what private locations and private runners are, common use cases, and step-by-step instructions on how to deploy a private runner using Helm. Finally, you'll see how to set up a simple browser test to run synthetics against a service available only within a Kubernetes cluster.

DX Operational Observability: Troubleshoot WebHook Notification Channels with WebHook Data Collector

The power of AIOps and Observability relies on the ability to ingest, normalize, and correlate the large volumes and huge variety of data available to IT operations teams. With its support for both Broadcom and third-party data, DX Operational Observability (DX O2) gives these teams unmatched observability and insights. With so much data coming to DX O2, monitoring operators need to be notified when important events may occur: Without notifications, important alerts may be overlooked.

From surface-level to strategic: Benefits of network traffic analysis

Enterprises are experiencing fluctuations in workforce dynamics amidst the insurgence of new technologies while also tackling the growing prevalence of cyberthreats. They are increasingly turning to cloud technologies, which are scalable and flexible, to adapt to these changes.