Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

AWS Well-Architected Framework in Serverless: Reliability Pillar

This is part three of the “Well-Architected in Serverless” series where we decipher the AWS Well-Architected Framework (WAF) pillars and translate them into real-life actions. In this article, we will focus on the AWS WAF Reliability (REL) pillar. Read Part 1 and Part 2. We have a Well Architected webinar coming up!

The Building Company: "Building" the future with OpManager

The Building Company serves the full spectrum of the construction industry including the residential, commercial, and industrial markets. The retail activities of the company are provided through 124 outlets throughout Southern Africa. Operations are located in major centers in South Africa, Namibia, Botswana, and Swaziland, and are managed as either corporate, joint venture, or franchise stores.

Datadog's AWS re:Invent 2020 guide

In Q4 of every year since 2012, AWS has flooded the Las Vegas strip with thousands of AWS staff, partners, and customers for a week of keynote sessions, announcements, workshops, and more. In a year like no other, we’re gearing up for a re:Invent like no other! As a long-time AWS partner, we look forward to re:Invent every year. We enjoy meeting you face to face, sharing the latest in monitoring and security, and learning a thing or two in the process.

Troubleshooting PostgreSQL: How to Use Logs and Metrics to Fix Slow Queries

Imagine some users complaining that querying PostgreSQL is slow (this never happened right?), and we have to troubleshoot this problem. It could be one of two things: I would normally first check on the environment, specifically PostgreSQL metrics over time. Such monitoring shows if the CPU is too high or how many disk reads were buffer reads. PostgreSQL logs also give information about the environment, such as how many statements were run and if any errors occurred.

The Secret Ingredient That Converts Metrics Into Insights

Metrics and Insight have been the obsession of every sector for decades now. Using data to drive growth has been a staple of boardroom meetings the world over. The promise of a data-driven approach has captured our imaginations. What’s also a subject of these meetings, however, is why investment in data analysis hasn’t yielded results. Directors give the go ahead to sink thousands of dollars into observability and analytics solutions, with no returns.

Scaling your IT Monitoring Solution: Complete Guide

Regularly, every company experiences growth in some form or the other. As it grows in every direction, there is an increase in the number of challenges for the IT department. In effect, it is imperative to be able to scale your IT monitoring solution. The idea is to make IT infrastructure monitoring easy and smooth.

Finding the Bug in the Haystack with Machine Learning: Logz.io Exceptions in Kibana

Logz.io is releasing its AI-powered Exceptions, a revamped version of our Application Insights, fully embedded in your Kibana Discover experience, to boost your troubleshooting experience and help you find bugs in the log haystack.

What's the Difference Between MTTR, MTTD, MTTF, and MTBF?

We’ve all been there. You’re on an important Zoom call with your team, and someone uses an abbreviation you’re not familiar with. You’ve heard it, but you’re not quite sure exactly what it means. You want to do a quick Google, but you’re sharing your screen! Ugh. Let’s pull apart some of these abbreviations for incident management KPIs (Key Performance Indicators). Now, you won’t find yourself SOL at your next Zoom call with the Support team.

How I started contributing to the Grafana open source project

My name is Karine. I’m a Software Engineer working with a team that provides monitoring solutions to our clients. A good part of my daily work is creating dashboards in Grafana. Since I started working with this tool, I have been so impressed by the quality and ease of use. I became even more impressed when I discovered it was an open source tool.

Raw & Real Ep 7 The Tracing You Deserve So You Can Observe

Distributed tracing is key to building and operating reliable services that make your customers happy. Traces pinpoint where failures occur and what causes poor performance. With tracing and observability, you can visualize the entire life cycle of service requests and discover hidden latency, errors, and optimization opportunities monitoring can’t show you. So why doesn’t everybody do it? Setting up tracing is notoriously difficult, but it doesn’t have to be. Honeycomb Instrumentation Engineer Paul Osman has the easy-breezy steps for you to get the tracing you deserve.