Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Breaking Through the Threshold: Leveling up ITSI Adaptive Thresholding with Splunk AI

Adaptive thresholding is a key capability in Splunk IT Service Intelligence (ITSI) that enables customers to dynamically monitor the status of their key performance indicators (KPIs) and derive meaningful service insights and alerts.

The Fatal Unconnectedness of Incumbents from Customers: The Tale of a Race Against the Clock

This tale is based on an actual event that happened to one of our Cribl Search customers. It highlights a massive gap between the urgent needs of modern businesses and the outdated, draconian terms dictated by traditional SIEM vendors. While the events are real, a touch of dramatization was added for the fun of it. Why not?

Deleting Fields from Logs: Why Less is Often More

Logs serve as an invaluable resource for monitoring system health, debugging issues, and maintaining security. But as our applications grow more complex, the volume of logs they generate is increasing exponentially. While logs are crucial, not all log data is equally valuable. With the surge in volume, costs associated with storing and analyzing logs are skyrocketing, impacting both performance and cost. The need for effective log management is more urgent than ever.

Grafana Incident auto-summary: AI in Grafana Cloud

Check out a fun demo of Grafana Incident auto-summary, which uses generative AI to suggest a helpful synopsis that captures key details from your incident timeline with a single click. Grafana Incident auto-summary marks the first feature enabled by the new OpenAI integration in Grafana Incident. Simply bring your own OpenAI API key to get started in Grafana Cloud.

Item Detail Page Updates

We’ve been listening to all the great feedback we’ve received on the new item detail page, and we’re pushing changes to help make investigating and understanding Rollbar items easier, quicker, and more efficient. The most visible change is that the context graphs have been moved to a single full-width view on the desktop so that you can immediately see the patterns of when occurrences happened, helping to spot patterns in behavior that can give insights into causes.

What makes a good open source community?

Whenever you use open source software, you benefit from the community that surrounds it — whether it’s a bug fix, better documentation, a helpful tutorial or something else. We at Grafana Labs benefit from the open source community, too: from your participation, and the many OSS components we use in the development of Grafana itself. But what makes an open source community successful, exactly? And how do you build and nurture one?

Incident Review: What Comes Up Must First Go Down

On July 25th, 2023, we experienced a total Honeycomb outage. It impacted all user-facing components from 1:40 p.m. UTC to 2:48 p.m. UTC, during which no data could be processed or accessed. This outage is the most severe we’ve had since we had paying customers. In this review, we will cover the incident itself, and then we’ll zoom back out for an analysis of multiple contributing elements, our response, and the aftermath.

Can Companies Really Self-Host at Scale?

Self-hosting is effective for many companies. But when is it time to let go and try the easier way? There’s no such thing as free lunch, or in this case, free software. It’s a myth. Paul Vixie, vice president of security at Amazon Web Services, creator of the original Domain Name System (DNS), gave a compelling presentation at Open Source Summit Europe 2022 about this topic.