Operations | Monitoring | ITSM | DevOps | Cloud

How to run faster Loki metric queries with more accurate results

Today I want to talk about metric queries. More specifically, I want to talk about an important concept that is going to make your queries run faster, give you more accurate results, and make your Grafana Loki operators (like me) much happier. A metric query in Loki looks like this: And the part I want to talk about is that at the end. Now, if you’re like me and have a short attention span and are already bored — I understand.

Troubleshooting Bad Health Checks on Amazon ECS

Health checks are an important factor when working with containerized applications in the cloud and are the source of truth for many applications in terms of their running status. In the context of AWS Elastic Container Service (ECS), health checks are a periodic probe to assess the functioning of containers. In this blog, we will explore how Lumigo, a troubleshooting platform built for microservices, can help provide insights into container crashes and failed health checks.

Save 96% on Data Storage Costs

Users with real-time and other analytic workloads want or need to keep large volumes of historical data to aid in important activities, such as ad hoc historical trend analysis and training AI models. However, storing this much data in a way that also makes it easily queryable becomes prohibitively expensive. As a result, users must balance data availability and usability with sacrificing data fidelity and storage costs. That is until now.

Graphios - Connecting Graphite and Nagios

Graphios simplifies the process of sending Nagios performance data to backend systems like Graphite. With Graphios, users can easily integrate Nagios with Graphite, eliminating the need for complex scripts. This article explores Graphios' functionality, configuration, and installation process, empowering users to efficiently transfer Nagios data for monitoring and analysis.

8 Ways to Meet Enterprise Network Service Level Agreements (SLAs)

Large cloud providers and ISPs offer service level agreements (SLAs) that guarantee uptime and help seal the deal with enterprises that value uptime. These same enterprises often ask IT to make the same guarantees for the performance and uptime of the internal network, its many varied connections and even the applications. At the same time, IT may have myriad SLAs from all kinds of vendors—including the aforementioned ISPs and cloud providers—it must manage.

Migration from Elasticsearch to OpenSearch

In this tutorial, we will guide you through the process of migrating from Elasticsearch to OpenSearch. OpenSearch is aan open-source search and analytics suite that is compatible with Elasticsearch. There are several reasons why people choose to migrate, such as taking advantage of new features or differences in governance. In the following sections, we will discuss version compatibility considerations, and guide you through the migration process.

Cloud Provider Uptime Monitoring: June 2023 Insights

Check our June 2023 health report on the top most popular cloud providers. We analyze the health of the cloud providers based on the number of outages and problems during the month. The source of the data is made available by the cloud providers themselves via their status page. We normalize it and use it to generate the report.

Upgrade Your Incident Response with IsDown and Squadcast Integration

We're thrilled to announce another significant feature release: the integration of IsDown with Squadcast. This integration brings a powerful addition to your incident management and SaaS outage monitoring toolkit, strengthening the reliability of your business and its response to any downtime. Squadcast is a top-tier incident management tool that works to improve site reliability engineering (SRE) techniques by providing seamless incident responses.