August 2020

What alerts should you have for serverless applications?

Aug 17, 2020 By Yan Cui In Lumigo

A key metric for measuring how well you handle system outages is the Mean Time To Recovery or MTTR. It’s basically the time it takes you to restore the system to working conditions. The shorter the MTTR, the faster problems are resolved and the less impact your users would experience and hopefully the more likely they will continue to use your product! And the first step to resolve any problem is to know that you have a problem.

Read Post

Lumigo

Read more about What alerts should you have for serverless applications?

Debugging AWS Lambda Timeouts

Aug 12, 2020 By Yan Cui In Lumigo

Some time ago, an ex-colleague of mine at DAZN received an alert through PagerDuty. There was a spike in error rate for one of the Lambda functions his team looks after. He jumped onto the AWS console right away and confirmed that there was indeed a problem. The next logical step was to check the logs to see what the problem was. But he found nothing. And so began an hour-long ghost hunt to find clues as to what was failing and why there were no error messages.

Read Post

Lumigo

Read more about Debugging AWS Lambda Timeouts

Operations | Monitoring | ITSM | DevOps | Cloud

August 2020

What alerts should you have for serverless applications?

Debugging AWS Lambda Timeouts

Monthly Archive

Follow Us