Operations | Monitoring | ITSM | DevOps | Cloud

Mean Time to Repair (MTTR): Definition, Tips and Challenges

The availability and reliability of any IT service ultimately govern end-user experience and service performance, both of which have significant business impact. These two concepts — availability and reliability — are particularly relevant in the era of cloud computing, where software drives business operations, but that software is often managed and delivered as a service by third-party vendors.

Ubuntu Desktop: charting a course for the future

It has been a little while since we shared our vision for Ubuntu Desktop, and explained how our current roadmap fits into our long term strategic thinking. Recently, we embarked on an internal exercise to consolidate and bring structure to our values and goals for how we plan to evolve the desktop experience over the next few years. This post is designed to share the output of those discussions and give insight into the direction we’re going.

Feature flags for stress-free continuous deployment

Feature flags (also known as feature toggles or switches) are conditional statements in code that determine whether a feature or functionality is visible and accessible to users of an application or service. They offer programmers a powerful tool for managing feature releases. Their capabilities are indispensable in software development, where agility and continuous, automated delivery are paramount.

A complete guide to metrics cost management in Grafana Cloud

The macro economy can put a lot of pressure on organizations to reduce costs, typically with the central SRE and platform engineering teams coming under scrutiny. One common workaround we’ve seen countless teams make is compromising their observability by ingesting fewer metrics in the name of cost savings. But for centralized SRE/observability teams, the response to macro conditions should not be monitor less, but rather monitor smarter.

How Pepperdata Does What Nobody Else Does

Here at Pepperdata, we’ve been on a number of sales calls lately where there’s a sense of incredulity on the other side of the video screen. How does Pepperdata extract as much as 50 percent in cost savings from some of the most sophisticated clusters in the world, the ones that had already been optimized for peak performance by the most dedicated and talented IT teams? It almost seems too good to be true. It’s not.

When Two Worlds Collide: AI and Observability Pipelines

In today's data-driven world, ensuring the stability and efficiency of software applications is not just a need but a requirement. Enter observability. But as with any evolving technology, there's always room for growth. That growth, as it stands today, is the convergence of artificial intelligence (AI) with observability pipelines. In this blog, we'll explore the idea behind this merge and its potential.

25 Best Status Page Examples Showcasing Top Communication Practices

Status pages are a transparent and effective way to inform users of any downtime or incidents disrupting the company’s service. Without a status page, users are left in the dark, and support tickets pile up, affecting your relationship with them and their trust. That’s why having a status page is essential for a business in 2023.

A Release Strategy for Continuous Innovation

At Cribl, we take pride in doing things differently. Our Customers First mentality is at the heart of everything we do as an organization–from free education and sandboxes, community programs, and platforms, to streamlining legal reviews on contracts. We strive to solve problems from first principles – understanding root causes to build optimal experiences vs. piecemeal solutions together. We aim to be a partner—working with you to address your challenges holistically.