Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How Grafana Labs switched to Karpenter to reduce costs and complexities in Amazon EKS

At Grafana Labs we meet our users where they are. We run our services in every major cloud provider, so they can have what they need, where they need it. But of course, different providers offer different services — and different challenges. When we first landed on AWS in 2022 and began using Amazon Elastic Kubernetes Service (Amazon EKS), we went with Cluster Autoscaler (CA) as our autoscaling tool of choice.

How to Use Tags to Speed Up Troubleshooting

Maybe as a kid, you pretended to have a magic wand. You would say something like, “Show me the answer to this long division question” then wave your magic wand and wait for the answer. Sadly, mine never seemed to work – for math questions or to make magical snacks appear. Now, imagine if you had a magic wand for your application stack where you could ask it a question about your data and it would give you immediate insights.

The Importance of Microservices

What are microservices? Microservices are a software approach that creates applications as a loose coupling of specific services or functions, rather than as a single, “monolithic” program. A microservice architecture increases the speed and reliability with which large, complex applications are delivered. What makes a service a microservice? Microservices are defined not by how they’re coded, but by how they fit into a broader system or solution.

Enrich Kubernetes with New Deployment Tracking Capability

When things go wrong, we’d all love the ability to go back in time, return things to the way they were, and fix whatever issues pop up at the start so they never happen in the first place. This is no different when maintaining complex microservices-based architectures. With any complex system, things are bound to go wrong from time to time.

How to Reduce MTTR: A Complete Guide

Organizations striving to improve their operational efficiencies must know how to reduce MTTR as it plays a key role in today’s fiercely competitive business landscape. Customer satisfaction is a top priority for most businesses and late response to their queries or issues can have a negative impact. To track the response and resolution time, businesses measure their MTTR score. MTTR is a key metric that gives insight as to how much time an organization takes to resolve an incident or issue.

Retrace Keynote: Observability (metrics, traces, logs) - Take back control of your data

In the keynote recorded during our recent user group session NUGGETS 2023, Sanjeev Mittal, GM of Retrace talks about how organizations are rapidly moving their infrastructure to the cloud and how their costs and complexity are increasing. Our customers are looking for Observability solutions that give them control of data and costs.

Tackling Staffing, Funding, and Data Challenges Head-On with TAQA

Join Ed Bailey and TAQA Group's Andrew Ochse as they discuss the diverse services that TAQA offers, look at the challenges with scaling and staffing, and explore in great detail the solutions to classic problems such as insufficient funding, poor data quality, and slow connections linking global sites to their Security Operations Center (SOC).

Custom Container Network Monitoring and Alerting in Kubernetes with Kentik Kube

Discover the power of proactive network monitoring in Kubernetes with Kentik Kube. This demo highlights the critical importance of custom dashboards and alerts in maintaining optimal container performance and service availability. We take you through creating a tailored alert for a checkout service within Kentik Kube. From selecting services to diving deep into performance metrics via the Kentik Data Explorer, we show you how Kentik Kube makes it easy to set up policies that monitor and alert you to Kubernetes network issues as they arise.