Putting VPC networking for Lambda to the test

**Update: November 7 **- AWS has rolled out this feature to 8 more regions, including ap-southeast-2, Sydney. The other regions are N. California, Ireland, Paris, Mumbai, Singapore and San Paulo. This roll out brings up the number to 11 regions with Ohio, Frankfurt and Tokyo, which received the feature in late September. With this update, all AWS accounts will see the improvement for their Lambda functions with VPC networking in those regions.

AWS Lambda is the number one service that comes to mind to most of us when we think about serverless. In Lambda you can run code without worrying about provisioning any servers and you only pay for what you use.

Since Lambda was announced in 2014, it has had many new features added, making the service better and better. Faster execution times, 15 minutes executions, the list goes on and on. VPC networking was still not quite there… yet.

How VPC networking for Lambda was

If there was something that would give you a headache, and needed a better solution, it was AWS Lambda support for VPC. By default, Lambda functions are provisioned in a Service VPC, that is, a VPC managed by AWS over which you have no control. Any Lambda in this VPC would have Internet access, meaning your Lambda code could reach any public endpoint.

When accessing resources in your own VPC things got a little bit harder to manage. Configuring your Lambda function to access your VPC means an Elastic Network Interface, or ENI, is provisioned per instance of your Lambda function, and is created and attached during each cold start.

As your application scales, more instances of your Lambda functions are required to serve increasing requests, and more ENIs are provisioned. This causes several issues:

When the Lambda Service scales your function, the first request that gets sent to the new instance suffers the cold start while the ENI is provisioned and attached. This happens even with the first ever invocation.
Even if you configure your Lambda function across multiple subnets and AZs (for high availability), each instance of your Lambda function is attached through the ENI to 1 single subnet/AZ.
As your application scales with demand the number of ENIs in your subnets grows. This has two side effects: you need to manage your ENI service limits (which is a per region limit, not per VPC, making it even harder if you have several VPCs with Lambda access in any given region) and each ENI consumes a private IP from your subnets range. This is not an easy limit to increase.
When you delete a Lambda function you will have to wait until all ENIs have been deleted. This can take a while, and when you provision transient environments with automation it can be painful.

How VPC networking for Lambda will be

On September the 3rd, AWS announced improved VPC networking for AWS Lambda. These improvements change the way Lambda provisions and now shares ENIs across Lambda functions and instances of those Lambda functions.

Now instead of the AWS Lambda service provisioning ENIs and attaching them to every new Lambda instance, it is basically doing the work upfront, and consolidating ENIs where possible. How?

When you create a new Lambda function, the AWS Lambda service will create 1 ENI per subnet in your Lambda configuration and will be ready for the first request.
When you create a Lambda function configured across a number of subnets, ideally across all AZs, and with a Security Group attached to it, the AWS Lambda service will determine if that combination is already in use by another Lambda function, and if so will reuse (share) the already existing ENIs. Zero provisioning and attachment time.
All networking between your Lambda functions and your VPC is managed by the same technology used by the NLBs, which enables massive scaling capabilities without provisioning more ENIs.
When you delete your Lambda function, if another function has the same subnet(s) and security group(s) combination, your Lambda function is deleted without waiting for ENIs to be deleted.
You don’t have to change anything. If your Lambda functions are provisioned through the console, API or CloudFormation, they will continue to work.

Let’s compare old vs new

Now that 3 regions have the new feature available, we can run a comparison and see it firsthand.

Here is what we will do:

Provision identical stacks in Sydney and Ohio regions.
Each region will have a VPC across 3 AZs, with private and public subnets.
A number of Lambda functions, with VPC access, will have different security group configurations: with no overlap, some overlap and full overlap.
To provide a fair comparison a couple of Lambda functions will be provisioned to generate requests (and load) and avoid network differences (I’m in Sydney, Ohio is a bit far away from here)

What we are looking for is the difference in ENI provisioning

When the Lambda function is provisioned
When the Lambda function scales
When the Lambda function is being deleted

You can clone this repo and provision the resources yourself using CloudFormation and Sceptre.

Once all resources are provisioned in Sydney and Ohio, let’s have a look at the Lambda functions and ENIs.

Lambda functions lambda-1-a, lambda-2-a, lambda-2-b, lambda-3-ab have VPC configuration, but there is no ENI provisioned for them. The only ENIs are those of the NAT Gateways.

To stress test these Lambda functions, lambda-stress-entry-point and lambda-stress-invoker will be used. We invoke lambda-stress-entry-point, and it will asynchronously invoke lambda-stress-invoker a number of times, which will invoke one of our lambda functions in the VPC asynchronously, again a number of times. This creates a multiplier effect and helps us simulate concurrent traffic.

To stress test lambda-1-a execute the following:

aws lambda invoke --invocation-type Event --function-name lambda-stress-entry-point --region ap-southeast-2 --payload '{"parallel_lambda_invoke": "20", "invocations_per_lambda": "50", "lambda_name": "lambda-1-a"}' out

See the payload {"parallel_lambda_invoke": "20", "invocations_per_lambda": "50", "lambda_name": "lambda-1-a"}, with 3 values:

arallel_lambda_invoke, or the number of async invocations to the lambda-stress-invoker function
invocations_per_lambda, or the number of async invocations each lambda-stress-invoker will do to our target :ambda function
lambda_name, our target function, the one we want to stress

We can see that 9 (don’t count the NAT Gateway ENIs) ENIs have been provisioned across 3 AZs. That makes it 3 groups of ENIs across all AZs. If we check the Lambda functions, we see 4 different functions, not 3. This is because lambda-1-a and lambda-2-a share the same subnets and security groups configuration, and therefore share the same set of ENIs.

Now let’s stress these 4 lambda functions and see the result.

aws lambda invoke --invocation-type Event --function-name lambda-stress-entry-point --region us-east-2 --payload '{"parallel_lambda_invoke": "20", "invocations_per_lambda": "50", "lambda_name": "lambda-1-a"}' out

If we check the ENI console there has been no additional provisioning of ENIs and remains at 1 ENI per subnet/security group combination. This shows that there will be no cold start due to provisioning a new instance of a Lambda function inside the VPC.

Moreover, we don’t have to worry about ENIs service limits going out of control. Now we can know in advance how many ENIs will be in use with Lambda by calculating the combinations of subnets/security groups in our functions, no matter how much these functions scale. The same applies to the private IPs in our subnets. We can calculate and size our subnets IP ranges upfront and plan for growth.

All lifecycle includes deletion

There is one more thing to verify in terms of improvements. If 2 or more functions share the same subnet and security groups, all but one can be deleted and will not have to wait for the ENIs to be deleted. Only the last one will have to suffer the delay.

This can be very annoying in transient or test environments, where Lambda functions get created and deleted with automation. Having to wait for everything to delete is painful when you are testing.

Let’s try it out. Functions lambda-1-a and lambda-2-a share subnet and security group configuration, so we should be able to delete 1 of them fast.

The Lambda function in the Ohio region deleted immediately as expected, without waiting for the ENI to clean up. It is interesting to see that CloudFormation still displays the message about the ENIs. It may be something we will see changing soon to tell us exactly what is (not) happening.

Conclusion

This is a game changer for those projects that involve Lambda functions that interact with endpoints inside a VPC.

ENI service limits and IP ranges within a VPC are much less likely to cause issues as you scale, and it will be easier to foresee the increase in demand of these as your workload grows.

VPC networking will not have an impact on cold starts any more.

Deleting resources in transient environments will be a lot simpler, and faster, if planned correctly. 2 or more Lambda functions that share subnet(s) and security group(s) configuration will share ENIs, and only the last function to be deleted will have to wait for the ENIs to be deleted.

There is no change required to benefit from this feature when rolled out to a new region. According the original blog post Roll out begins today and continues gradually over the next couple of months across all Region. Unfortunately, it remains unknown when this feature will be available in Sydney. Keep an eye here and we will update as soon as it gets released.

This article was written by Jesus Rodriguez, Cloud Solutions Architect at AC3. Jesus has over 8 years of experience in IT, Certified AWS Solutions Architect Professional, Certified AWS DevOps Professional, and Certified AWS Big Data - Specialty, with a strong focus on implementing, deploying and provisioning secure, high available, scalable and cost optimized applications utilizing the best AWS services for each use case.