Operations | Monitoring | ITSM | DevOps | Cloud

iLert

Painless Kubernetes monitoring and alerting

Kubernetes is hard, but lets make monitoring and alerting for Kubernetes simple! At iLert we are creating architectures composed of microservices and serverless functions that scale massively and seamlessly to guarantee our customers uninterrupted access to our services. As many others in the industry we are relying on Kubernetes when it comes to the orchestration of our services.

New Features: On-call Reports, On-call Reminder, Terraform Provider, Zapier Integration

We’re proud to introduce our latest addition to iLert’s advanced reporting capabilities. On-call reports give on-call engineers and managers insights into all-things on-call and report three metrics: The data can be filtered by data range and schedules. If a user is on two or more schedules at the same time, that time is counted once. This information can be used in various ways.

Automating Monitoring & Alerting Infrastructure with Terraform

At iLert we embrace infrastructure as code and try to automate our processes whereever possible. This might reach from niftly little bash scripts to fully blown Terraform projects that spin up whole environments with as little as terraform apply on a CLI. With Hashicorp’s Terraform you can make use of infrastructure as code to provision and manage any cloud, infrastructure, or service.

New Features: Heartbeat Monitoring, Incident Actions, Suggested Responders, Incident Re-routing

You might have noticed that we’ve added a new type of alert source a few months ago - Heartbeat alert sources: A Heartbeat alert source expects a signal (the “heartbeat” ping) at regular intervals and alerts you, if it doesn’t receive a ping within the specified interval.

Working with multiple on-call teams using Zabbix and iLert

This post outlines how to use Zabbix and iLert with multiple on-call teams, where each team is responsible for a set of host groups in Zabbix, and therefore, will only receive alerts for the services it is responsible for. But first, let’s start with the basic needs when being on-call.

Monitoring IoT devices using heartbeats and MQTT gateways

When working with IoT (internet of things) devices one of the key issues is to keep track of the health of all installations. Most of the time, especially with smaller devices, the applications (firmwares) are flashed for a single time during setup and stay untouched at their location of action for a long while.

Building Automated Monitoring with Icinga and iLert

How many servers can be managed by one system administrator? This question is pretty hard to answer since it depends decisively on the tasks that need to be operated. It is clear, however, that the amount of servers one engineer can manage has increased tremendously over the time, and is still growing. Public and private clouds, in combination with automation tools, enables us to automate many daily tasks. In a modern IT infrastructure almost everything can, and should, be automated.

iLert

iLert is an incident and on-call management platform for DevOps teams. iLert helps you to respond to incidents faster by adding on-call schedules, SMS, and voice alerts to your existing monitoring tools.

Release Notes: Stakeholder Engagement, Uptime Monitoring API, Flexible Periods for Schedules, and more

Nowadays, a working digital infrastructure is the lifeblood of almost any organization. The impact of a major IT incident can go far beyond the IT department, affecting a company’s revenue or incur costs in other areas of the business caused by service disruption. Therefore, in addition to the technical response to a major incident from the IT department, business stakeholders need to be involved as well, so they can prepare the business response.

Integrating dynamic SaaS hosted Uptime Monitoring into your customer-served Applications

Imagine you are rolling out your application to multiple customers, they even might use it on premise. Of course you want to know if your application is running fine and the customer is not experiencing any kind of trouble or downtime - surely you would not want to ship this validation in your own system, as that might also be prone to any kind of error at some point. Which is why you decide to go for a third party uptime monitoring solution e.g. Uptime Monitoring.