Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Kubernetes in Production: Services

We migrated all of our services to Kubernetes about six months ago. At first glance, the task seemed quite simple: deploy a cluster, write application specifications, and that’s it. But, since we’re obsessed with stability, we nevertheless had to learn how k8s works under pressure, so we tested multiple failure scenarios. Most of the questions that arose were network related. One particular point of concern was how Kubernetes Services function.

PgBouncer monitoring improvements in recent versions

As I wrote in my previous article “USE, RED and real world PgBouncer monitoring” there are some nice commands in PgBouncer’s admin interface that allow to collect stats how things going and spot problems, if you know where to look. This post is about new stats added in these commands in new PgBouncer versions.

USE, RED and real world PgBouncer monitoring

Brendan Gregg’s USE (Utilization, Saturation, Errors) method for monitoring is quite known. There are even some monitoring dashboard templates shared on the Internet. There’s also Tom Wilkie’s RED (Rate, Errors, Durations) method, which is suggested to be better suited to monitor microservices than USE. We, at okmeter.io, recently updated our PgBouncer monitoring plugin and while doing that we’ve tried to comb everything and we used USE and RED as frameworks to do so.

Simple/hard metrics that help reduce MTTR when looking for a root cause

Recently there was a mini-incident in a data center where we host our servers. It did not affect our service after all. And thanks to the right operational metrics, we’ve been able to instantly figure our what’s happening. But then an thought came up to me, how we would’ve been racking our heads trying to understand what’s happening without 2 simple metrics.

Monitoring (with) Elasticsearch: A few more circles of hell

This is the second part of our two-part article series devoted to Elasticsearch monitoring. The heading of this article refers to Dante Alighieri’s “Inferno”, in which Dante offers a tour through the nine increasingly terrifying levels of hell. Our journey into Elasticsearch monitoring was also filled with hardships, but we have overcome them and found solutions for each case.

PostgreSQL: Exploring how SELECT Queries can produce disk writes

We already wrote about monitoring posgresql queries, at the time we thought that we completely understood how PostgreSQL works with various server resources. Working regularly with the statistics of PostgreSQL queries, we noticed some anomalies and decided to dig a bit deeper for better understanding. Through this process, we found that while the behavior of postreSQL is kind of strange at first glance (or at least very peculiar), the clarity of its source code is quite admirable.