I'll cover the basics of incident routing using Amixr. Just a quick and useful how-to.

Imagine we're receiving incidents and alerts from Prometheus and we need to route them to different Slack Channels based on their content and severity.

Our routing plan:

  1. Route performance - related alerts to the SRE team (#SRE Slack channel)
  2. Route application - related alerts to the Dev team (#dev Slack channel)
  3. Route critical alerts to someone from Dev & SRE teams simultaneously (#incidents Slack channel).

1) Add Alertmanager (Prometheus) integration

2) Write regexps for routes

Routes are regexps which are applied to the whole incident body.

Amixr will try to apply available routes one by one until it will detect a match. It uses Python's regex flavor so I suggest https://regex101.com/ for debugging.

Let's look at the example payload:

{
"endsAt": "0001-01-01T00:00:00Z",
"labels": {
"region": "eu-1",
"alertname": "TestAlert"
},
"status": "firing",
"startsAt": "2018-12-25T15:47:47.377363608Z",
"amixr_demo": true,
"annotations": {},
"generatorURL": ""
}

Regexps matching this payload are: eu-1\"region\": \"eu-1\" or even \"alertname\": \".*Alert\".

Let's return to our Prometheus. It's an imaginary file with alerting rules:

  - alert: Disk is almost empty
expr: one > another
for: 5m
labels:
severity: "high"
level: "infra"

- alert: HighCpuLoad
expr: one > another
for: 5m
labels:
severity: "low"
level: "infra"

- alert: 500 rate is high
expr: one > another
for: 1m
annotations:
summary: "Too many 500's!"
level: "app"

Based on what we want regex'es should look like:

  1. \"severity\": \"high\" for #incidents Slack channel
  2. \"level\": \"app\" for #dev Slack channel
  3. \"level\": \"infra\" for #SRE Slack channel

The more specific routes should go first! Amixr is trying them one-by-one until the first match, you don't want the "broad" regex to steal your alerts!

3) Set up routes

Just add all three routes using web interface:

Or using Terraform:

resource "amixr_route" "route-1-alertmanager-bellheart" {
integration_id = amixr_integration.alertmanager-bellheart.id
routing_regex = "`\\\"severity\\\": \\\"high\\\"`"
position = 0
slack {
channel_id = data.amixr_slack_channel.slack-channel-incidents.slack_id
}
}

resource "amixr_route" "route-1-alertmanager-bellheart" {
integration_id = amixr_integration.alertmanager-bellheart.id
routing_regex = "`\\\"level\\\": \\\"app\\\"`"
position = 1
slack {
channel_id = data.amixr_slack_channel.slack-channel-dev.slack_id
}
}

resource "amixr_route" "route-1-alertmanager-bellheart" {
integration_id = amixr_integration.alertmanager-bellheart.id
routing_regex = "`\\\"level\\\": \\\"infra\\\"`"
position = 2
slack {
channel_id = data.amixr_slack_channel.slack-channel-sre.slack_id
}
}

Please use pre-generated Terraform file ("Infrastructure As Code" menu) to get data.amixr_slack_channel and other objects!

4) Add escalation policies

Don't forget to add escalation policies so Amixr will notify the right person for each route. For this example we'll use "Notify user (next each time)", it will equally distribute incidents between engineers.

Hint: how to get RAW incident payload at the Amixr side?

This is the first post about basic Amixr use-cases. Stay tuned to get more insights ;)