Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Alerts Don't Suck: YOUR Alerts Suck!

Join Leon Adato, Kentik's Principal Technical Evangelist, for, "Alerts Don't Suck: YOUR Alerts Suck." In this engaging talk, Leon shares a personal anecdote about generating a staggering 772 tickets - twice - in just 15 minutes, setting the stage for an enlightening exploration of alert management. Leon discusses the common misconceptions and pitfalls of alerts in network observability, distinguishing between effective and ineffective alert strategies. He demystifies the concepts of monitoring versus observability, offering practical advice on creating alerts that genuinely add value and drive action. Whether you're an IT professional, network engineer, or anyone interested in improving your alerting philosophy, this talk is packed with actionable insights, humor, and real-world examples. Leon's hard-won advice might just transform your approach to alerting and optimizing your monitoring systems.

How To Save Money On Your Observability Costs

In today's digital age, the complexity and scope of dynamic system architectures are expanding at an unprecedented rate. As a result, IT teams find themselves grappling with the challenge of monitoring and addressing conditions across multi-cloud environments. With the increasing complexities, IT operations, DevOps, and SRE teams are searching for enhanced observability within these multifaceted computing environments.

My First Kubecon - Tales of the K8's community, DE&I, sustainability, and OTel

I went to my first Kubecon ever this last week. If you’re not familiar with Kubecon, it is a convention that is around Kubernetes, a Cloud Native Community Foundation (CNCF) open source project. With this being my first Kubecon ever, it was an adventure all around building community, education, kindness, and of course, a love for Kubernetes technology.

Set and scale service level objectives in Grafana Cloud: Introducing Grafana SLO

When we began offering Grafana Cloud Metrics, we set a service level agreement (SLA) for 99.5% of requests to be completed within a few seconds. So we built an alert that would go off if more than 0.5% of requests were slower than a couple of seconds within a five-minute moving window. Sounds reasonable, right?