Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Site Reliability Engineering, Site Reliability Engineers and SRE Practices: State of Adoption

Site reliability engineering (SRE) is what you get when you treat operations as if it’s a software problem. The mission of an SRE practice is to protect, provide for and progress the software and systems offered and managed by an organization with an ever-watchful eye on their availability, latency, performance and capacity.1.

Welcome to the Experience-Driven NOC

At Broadcom Software, we strive to build the most scalable operational software in the market. We work to ensure that our network monitoring software can track how constant network changes affect user experiences. As a global provider of networking equipment, we understand that there will always be changes happening on today’s enterprise networks, especially the internet. That’s why we build and refine our monitoring software to align with constant change.

How to Generate Client Referrals as an MSP

I speak to a lot of MSPs every day, and I’m always asked for advice on meeting new prospects. Finding new clients is one of the biggest challenges they all face. “Have you tried asking your existing clients for referrals?” is my stock answer. And for good reason—prospects referred to you by an existing client are four times more likely to buy than any other opportunities. Here are a few of my tips on how to generate client referrals from existing clients.

Building Auvik Into Your MSP's SOP (Video)

Standard operating procedures—more commonly known as SOPs—are written, step-by-step instructions that describe how to perform a routine activity. While you can create an SOP for anything, an MSP SOP that outlines technical procedures is a well-known path to increasing efficiency in your business. Whether you’ve got existing MSP SOPs you’re interested in updating, or you’re looking for some basic steps to build brand-new SOPs around, you’ve come to the right place.

What's the Perfect IT Support Staff Ratio?

On a fairly regular basis, users will post to Reddit or Spiceworks or another IT forum to ask about the best support staff ratio of techs to users, and what other companies are finding sustainable. The question usually comes from an overworked tech who’s drowning in tickets and trying to understand what’s considered. The answers can get interesting. On Spiceworks, many people responded with details about the environments they support—and the range was notable.

Authors' Cut-Actionable SLOs Based on What Matters Most

SLOs—or Service Level Objectives—can be pretty powerful. They provide a safety net that helps teams identify and fix issues before they reach unacceptable levels and degrade the user experience. But SLOs can also be intimidating. Here’s how a lot of teams feel about them: We know we want SLOs, we’re not sure how to really use them, and we don’t know how to debug SLO-based alerts. Don’t worry, we’ve got your answer—observability!

Telegraf Tips from InfluxDB University Experts

Telegraf is a very powerful open source plugin-based agent that gathers data from stacks, sensors, and systems and sends it to a database. It collects data from an input and sends it to an output, and gives you the option to transform data with aggregators and processors before it reaches its endpoint.

New in Grafana 9.1: Service accounts are now GA

With the Grafana 8.5 release, we introduced the concept of service accounts. Now with the Grafana 9.1 release, we’re making service accounts generally available. This is a project that came out of technical necessity, but it has given us the opportunity to reflect on API tokens and machine-to-machine interaction across Grafana Labs.

How much does RPKI ROV reduce the propagation of invalid routes?

Earlier this year, Job Snijders and I published an analysis that estimated the proportion of internet traffic destined for BGP routes with ROAs. The conclusion was that the majority of internet traffic goes to routes covered by ROAs and are thus eligible for the protection that RPKI ROV offers. However, ROAs alone are useless if only a few networks are rejecting invalid routes.