Operations | Monitoring | ITSM | DevOps | Cloud

Lumigo

AWS Lambda Extensions: What are they and why do they matter

There is a growing ecosystem of vendors that are helping AWS customers gain better observability into their serverless applications. All of them have been facing the same struggle: how to collect telemetry data about AWS Lambda functions in a way that’s both performant and cost-efficient. To address this need, Amazon is announcing today the release of AWS Lambda Extensions.

This is all you need to know about Lambda cold starts

So much has been written about Lambda cold starts. It’s easily one of the most talked-about and yet, misunderstood topics when it comes to Lambda. Depending on who you talk to, you will likely get different advice on how best to reduce cold starts. So in this post, I will share with you everything I have learned about cold starts in the last few years and back it up with some data.

How to Debug Slow Lambda Response Times

When you build your application on top of Lambda, AWS automatically scales the number of “workers” (think containers) running your code based on traffic. And by default, your functions are deployed to three Availability Zones (AZs). This gives you a lot of scalability and redundancy out of the box. When it comes to API functions, every user request is processed by a separate worker. So the API-level concurrency is now handled by the platform.

What alerts should you have for serverless applications?

A key metric for measuring how well you handle system outages is the Mean Time To Recovery or MTTR. It’s basically the time it takes you to restore the system to working conditions. The shorter the MTTR, the faster problems are resolved and the less impact your users would experience and hopefully the more likely they will continue to use your product! And the first step to resolve any problem is to know that you have a problem.

Debugging AWS Lambda Timeouts

Some time ago, an ex-colleague of mine at DAZN received an alert through PagerDuty. There was a spike in error rate for one of the Lambda functions his team looks after. He jumped onto the AWS console right away and confirmed that there was indeed a problem. The next logical step was to check the logs to see what the problem was. But he found nothing. And so began an hour-long ghost hunt to find clues as to what was failing and why there were no error messages.

Webinar: Debugging Lambda Performance Issues

One of the most common performance issues in serverless architectures, and specifically, AWS Lambda, is elevated latencies from external services, such as DynamoDB, ElasticSearch, or Stripe. In this webinar, we will focus on how to monitor, detect, and fix latency issues that arise when our Lambda functions need to talk to other services. Some of the topics we will cover include:

How to Debug AWS Lambda Performance Issues

Ten years ago, Amazon found that every 100ms of latency would cost them roughly 1% in sales. This is a pretty clear statement on the importance of user experience! It’s especially true in today’s ultra-competitive market where the cost of switching (to another provider) for consumers is lower than ever. And one of the most common performance issues in serverless architectures is related to elevated latencies from services we depend on.

Hooray, We're Cool! At least according to Gartner :)

We’re thrilled to announce that Gartner has named Lumigo a cool vendor in its recent Cool Vendors in Performance Analysis for Cloud-Native Architectures report by Padraig Byrne, Josh Chessman, Federico De Silva, Pankaj Prasad, Charley Rich, published May 18, 2020. The report explains something we at Lumigo heartily agree with: in a cloud-first world, the lines between development and operations are blurring.

Unlocking new serverless use cases with EFS and Lambda

Today, the AWS Lambda platform has added a new arrow to its quiver – the ability to integrate with Amazon Elastic File System (EFS) natively. Until now, a Lambda function was limited to 512MB of /tmp directory storage. While this is sufficient for most use cases, it’s often prohibitive for use cases such as Machine Learning, as Tensorflow models are often GBs in size and cannot fit into the limited /tmp storage.