Scaling containers on AWS in 2022

Comparing how fast containers scale up under different orchestrators on AWS in 2022

Reading time: about 45 minutes

April 2022

This all started with a blog post back in 2020, from a tech curiosity: what's the fastest way to scale containers on AWS? Is ECS faster than EKS? What about Fargate? Is there a difference between ECS on Fargate and EKS on Fargate? I had to know this to build better architectures for my clients.

In 2021, containers got even better, and I was lucky enough to get a preview and present just how fast they got at re:Invent!

What about 2022? What's next in the landscape of scaling containers? Did the previous trends continue? How will containers scale this year? What about Lambda? We now have the answer!

Hand-drawn-style graph showing how long it takes to scale from 0 to 3500 containers: Lambda instantly spikes to 3000 and then jumps to 3500, ECS on Fargate starts scaling after 30 seconds and reaches close to 3500 around the four and a half minute mark, EKS on Fargate starts scaling after about a minute and reaches close to 3500 around the eight and a half minute mark, EKS on EC2 starts scaling after two and a half minutes and reaches 3500 around the six and a half minute mark, and ECS on EC2 starts scaling after two and a half minutes and reaches 3500 around the ten minute mark

Tl;dr:

Fargate is now faster than EC2
ECS on Fargate improved so much and is the perfect example for why offloading engineering effort to AWS is a good idea
ECS on Fargate using Windows containers is surprisingly fast
App Runner is on the way to becoming a fantastic service
Up to a point, EKS on Fargate is faster than EKS on EC2
EKS on EC2 scales faster when using karpenter rather than cluster-autoscaler, even in the worst possible scenario
EKS on EC2 is a tiny bit faster when using IPv6
Lambda with increased limits scales ridiculously fast

Beware, this benchmark is extremely specific and meant to provide a frame of reference, not completely accurate results — the focus here is making informed architectural decisions, not on squeezing out the most performance!

That's it! If you want to get more insights or if you want details about how I tested all this, read on.

Preparation

Before any testing can be done, we have to set up.

Limit increases

First up, we will reuse the same dedicated AWS Account we used in 2020 and 2021 — my "Container scaling" account.

To be able to create non-trivial amounts of resources, we have to increase a couple of AWS quotas.
I do not want to showcase exotic levels of performance that only the top 1% of the top 1% of AWS customers can achieve. At the same time, we can't look at out-of-the-box performance since AWS accounts have safeguards in place. The goal of this testing is to see what "ordinary" performance levels all of us can get, and for that we need some quota increases:

by default, one can run a maximum of 1 000 Fargate Concurrent Tasks. We're scaling to more than that, so the limit was increased to 10 000
by default, one can run at most 5 EC2 Spot Instances. That is not enough for our testing, and, after chatting with AWS Support, the EC2 Spot Instances limit was raised to 4 500 vCPUs which is about 280 EC2 Spot instances
by default, EKS does a fantastic job of scaling the Kubernetes Control Plane components (really — I tested this extensively with my customers). That said, our test clusters will be spending a lot of time idle, with zero containers running. We are not benchmarking EKS Control Plane scaling, and I'd rather eliminate this variable. AWS is happy to pre-scale clusters depending on the workload, and they did precisely that after some discussions and validation of my workload: the Kubernetes Control Plane was pre-scaled for all our EKS clusters.

That's it — not performance quota increases, but capacity quota increases! To scale a bunch, we need to create a bunch of instances. These are quotas that everybody should be able to get without too much work. Unless explicitly stated, these are all the quota increases I got.

Testing setup

Based on the previous tests in 2020 and 2021, we will scale up to 3 500 containers, as fast as possible: we'll force scaling, by manually changing the desired number of containers from 1 to 3 500.
I think this keeps a good balance: we're not scaling to a small number, but we're not wasting resources scaling to millions of containers. By forcing the scaling, we're eliminating a lot of complexity: there is a ridiculous number of ways to scale containers! We're avoiding going down some complex rabbit holes: no talking about optimizing CloudWatch or AWS Autoscaling intervals and reaction times, no stressing about the granularity at which the application exposes metrics, no optimizing the app → Prometheus Push Gateway → Prometheus → Metrics Server flow, we're blissfully ignoring many complexities. For an application to actually scale up, we first must detect that scaling is required (that can happen with a multitude of tools, from CloudWatch Alarms to KEDA to custom application events), we must then decide how much we have to scale (which can again be complex logic — is it a defined step or is it dynamic?), we then have to actually scale up (how is new capacity added and how does it join the cluster?), and finally we have to gracefully utilize that capacity (how are new instances connected to load balancers, how do new instances impact the scaling metrics?). This whole process can be very complex and it is often application-specific and company-specific. We want results that are relevant to everybody, so we will ignore all this and focus on how quickly we can get new capacity from AWS.

We will run all the tests in AWS' North Virginia region (us-east-1), as we've seen in previous years that different AWS regions have the same performance levels. No need to also run the tests in South America, Europe, Middle East, Africa, and Asia Pacific too.

In terms of networking, each test that requires a networking setup will use a dedicated VPC spanning four availability zones (use1-az1, use1-az4, use1-az5, and use1-az6), and all containers will be created across four private /16 subnets. For increased performance and lower cost, each VPC will use three VPC Endpoints: S3 Gateway, ECR API, and ECR Docker API.

In terms of servers, for the tests that require servers, we will use the latest and the greatest: AWS' c6g.4xlarge EC2 instances.
In the previous years, we used c5.4xlarge instances, but the landscape has evolved since then. Ideally, we'd like to keep the same server size to accurately compare results between 2020 and 2021 and 2022.
This year we have 2 options: c6i.4xlarge which are Intel-based servers with 32 GBs of memory and 16 vCPUs or c6g.4xlarge which are ARM-based servers using AWS' Graviton 2 processors with 32 GBs of memory and 16 vCPUs. AWS also announced the next-generation Graviton 3 processors and c7g servers, but those are only available in limited preview. Seeing how 16 Graviton vCPUs are both faster and cheaper than 16 Intel vCPUs, c6g.4xlarge sounds like the best option, so that's what we will use for our EC2 instances. To further optimize our costs, we will use EC2 Spot Instances which are up to 90% cheaper than On-demand EC2 instances. More things have to happen on the AWS side when a Spot Instance is requested, but the scaling impact should not be significant and the cost savings are a big draw.

The operating system landscape has evolved too. In previous years we used the default Amazon Linux 2 operating system, but in late 2020, Amazon launched an open-source operating system focused on containers: Bottlerocket OS. Since then, Bottlerocket matured and grew into an awesome operating system! Seeing how Bottlerocket OS is optimized for containers, we'll run Bottlerocket as the operating system on all our EC2 servers.

How should we measure how fast containers scale?
In the past years, we used CloudWatch Container Insights, but that won't really work this year: the best metrics we can get are minute-level metrics. Last year we had services scale from 1 to 3 500 in a couple minutes and with minute-level data we won't get proper insights, will we?
To get the most relevant results, I decided we should move the measurement directly in the container: the application running in the container should record the time it started at! That will give us the best metric: we will know exactly when the container has started.
In the past years, we used the poc-hello-world/namer-service application: a small web application that returns hello world. For this year, I extended the code a bit based on our idea: as soon as the application starts, it records the time it started at! It then does the normal web stuff — configuring a couple web routes using the Flask micro-framework.
Besides the timestamp, the application also records details about the container (name, unique id, and whatever else was easily available) and sends all this data synchronously to Honeycomb and asynchronously to CloudWatch Logs — by using two providers we are protected in case of failure or errors. We'll use Honeycomb as a live-UI with proper observability that allows us to explore, and CloudWatch Logs as the definitive source of truth.

Now that we have the application defined, we need to put it in a container! I built the app using GitHub Actions and Docker's build-and-push Action and a 380 MB multi-arch (Intel and ARM) container image resulted. The image was then pushed to AWS' Elastic Container registry (ECR) which is where all our tests will download it from.

Keeping in line with the "default setup" we are doing, the container image is not at all optimized!
There are many ways to optimize image sizes, there are many ways to optimize image pulling, and there are many ways to optimize image use — we would never get to testing if we keep optimizing. Again, this testing is meant to provide a frame of reference and not showcase the highest levels of performance.
For latency-sensitive workloads, people are optimizing image sizes, reusing layers, using custom schedulers, using base server images (AMIs) that have large layers or even full images already cached, and much more. It's not only that, but container runtimes matter too and performance can differ between say Docker and containerd. And it's not only image sizes and runtimes, it's also container setup: a container without any storage differs from a container with a 2 TB RAM disk, which differs from a container with an EBS volume attached, which differs from a container with an EFS volume attached.
We are testing to get an idea of how fast containers scale, and a lightweight but not minimal container is good enough. Performance in real life will vary.

That's all in terms of common setup used by all the tests in our benchmark! With the base defined, let's get into the setup required for each container service and the results we got!

Kubernetes scaling

Kubernetes — the famous container orchestrator.
Kubernetes has its components divided in two sections: the Control Plane and the Worker Plane. The Control Plane is like a brain: it decides where containers run, what happens to them, and it talks to us. The Worker Plane is the servers on which our containers actually run.

We could run our own Kubernetes Control Plane, with tools like kops, kubeadm, or the newer cluster-api, and that would allow us to optimize each component to the extreme. Or, we could let AWS handle that for us through a managed service: Amazon Elastic Kubernetes Service, or EKS for short. By using EKS, we can let AWS stress about scaling and optimizations and we can focus on our applications. Few companies manage their own Kubernetes Control Planes now, and AWS offers enough customization for our use-case, so we'll use EKS!

For the Worker Plane, we have a lot more options:

self-managed workers. These are EC2 instances: we manage them, we configure them, we update them, we do everything
AWS-managed workers. If we want to stress a bit less, we can take advantage of EKS Managed Node Groups where we still have EC2 instances in our AWS account, but AWS handles the lifecycle management of those EC2s
serverless workers. If we want the least amount of stress, like not even caring about configuration or the operating system of servers and patches, we can use serverless workers through EKS on Fargate: we give AWS a container and we tell them to run it using EKS on Fargate — AWS will run it on a server they manage, update, and control
fancy 5G workers through AWS Wavelength. These are also EC2 instances, but hosted by 5G networking providers for super-low latency
extended-region workers through AWS Local Zones. These are still EC2 instances, but hosted in popular cities for lower latency
your own workers running on a big AWS-managed server that you buy from AWS and install in your own datacenter though AWS Outposts

For our testing, we'll ignore the less common options, and we'll use both EC2 workers managed through EKS Managed Node Groups and AWS-managed serverless workers through EKS on Fargate.

To get visibility into what is happening on the clusters, we will install two helper tools on the cluster: CloudWatch Container Insights for metrics and logs, and kube-eventer for event insights.

Our application container will use its own AWS IAM Role through IAM Roles for Service Accounts. When using EC2 workers, we will use the underlying node's Security Group and not a dedicated security group for the pod due to the impact that has on scaling — a lot more networking setup has to happen and that slows things down. When using Fargate workers there is no performance impact, so we will configure a per-pod security group in that case.
Containers are not run by themselves — what happens if the container has an error or restarts, or during an update — but as part of larger concepts. In the ECS world we have Services and in the Kubernetes world we have Deployments, but both of them do the same work: they manage containers. For example, a Deployment or a Service of 30 containers will constantly try to make sure 30 containers are running. If a container has an error, it is restarted. If a container dies, it is replaced by a new container. If the application has to be updated, the Deployment will handle the complex logic around replacing each of the 30 running containers with new and updated containers, and so on. In 2021, we saw that multiple Deployments or multiple Fargate Profiles have no impact on the scaling speed, so our application's containers will be part of a single Deployment, and using a single Fargate Profile. The Kubernetes Pod will be using 1 vCPU and 2 GBs and will have a single container: our test application.

There are multiple other configuration options, and if you want to see the full configuration used, you can check out the Terraform infrastructure code in the eks-* folders in vlaaaaaaad/blog-scaling-containers-on-aws-in-2022 on GitHub.

Now, with the setup covered, let's get to testing! I ran all the tests between December 2021 and February 2022, using all the latest versions available at the time of testing.

In terms of yearly evolution, EKS on EC2 remained pretty constant — in the below graph there are actually 3 different lines for 3 different years, all overlapping:

Hand-drawn-style graph showing yearly evolution for EKS: the 2020, 2021, and 2022 lines are all overlapping each other. Scaling starts after the two minute mark and reaches 3500 around the seven minute mark

But that is not the whole story! For 2022 we get a couple new options when using EC2 workers: an alternative scaler and IPv6 support.

Until late 2021, the default way of scaling EC2 nodes was cluster-autoscaler — there were other scalers too, but nothing as popular or as adopted. Cluster Autoscaler is an official Kubernetes project, with support for scaling worker nodes on a bunch of providers — from the classic big 3 clouds all the way to Hetzner or Equinix Metal. Cluster Autoscaler works on homogenous node groups: we have an AutoScaling Group that has instances of the exact same size and type, and Cluster Autoscaler sets the number of desired instances. If new instances are needed, the number is increased. If instances are unused, servers are cleanly drained and the number is decreased. For our benchmark, we're using a single AutoScaling Group with c6g.4xlarge instances.

For folks running Kubernetes on AWS, Cluster Autoscaler is great, but not perfect.
Cluster Autoscaler does its own node management, lifecycle management, and scaling which means a lot of EC2 AutoScaling Group (ASG) features are disabled. For example, ASGs have support for launching different instance sizes as part of the same group: if one instance with 16 vCPUs is not available, AGSs can figure out that two instances with 8 vCPUs are available, and launch that to satisfy the requested number. That feature and many others (predictive scaling, node refreshes, rebalancing) have to be disabled.
Cluster Autoscaler also knows little about AWS and what happens with EC2 nodes — it configures the desired number of instances, and then waits to see if desired_number == running_number. In cases of EC2 Spot exhaustion, it takes a while for Cluster Autoscaler to figure out what happened, for example.
By design, and by having to support a multitude of infrastructure providers, Cluster Autoscaler works close to the lowest common denominator: barebones node groups. This approach also forces architectural decisions: by only supporting homogenous node groups, a bunch of node groups are defined and applications are allocated to one of those node groups. Does your team have an application with custom needs? Too bad, it needs to fit in a pre-existing node group, or it has to justify the creation of a new node group.

To offer an alternative, Karpenter was built and it was released in late 2021. Karpenter, instead of working with AutoScaling Groups that must have servers of the same size (same RAM and CPU), directly calls the EC2 APIs to launch or remove nodes. Karpenter does not use AutoScaling Groups at all — it manages EC2 instances directly! This is a fundamental shift: there is no longer a need to think about server groups! Each application's needs can be considered individually.
Karpenter will look at what containers have to run, at what EC2 instances AWS offers, and make the best decision on what server to launch. Karpenter will then manage the server for its full lifecycle, with full knowledge of what AWS does and what happens with each server. With all this detailed information, Karpenter has an advantage in resource-constrained or complex environments — like say not enough EC2 Spot capacity or complex hardware requirements like GPUs or specific types of processors — Karpenter knows why an EC2 instance could not be launched and does not have to wait for a while to confirm that indeed desired_number == running_number. That sounds great and it sounds like it will impact scaling speeds!

That said, how should we compare Cluster Autoscaler with Karpenter? Given free rein to scale from 1 container to 3 500 containers, Karpenter will choose the best option: launching the biggest instances possible. While that may be a fair way to compare them, the results will be at odds.
I decided to compare Cluster Autoscaler (with 1 ASG of c6g.4xlarge) with Karpenter also limited to launching just c6g.4xlarge instances. This is the worst possible case for Karpenter and the absolute best case for Cluster Autoscaler, but it should give us enough information about how they compare.

Surprisingly, even in this worst case possible, EKS on EC2 using Karpenter is faster than EKS on EC2 using Cluster Autoscaler:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. EKS on EC2 with cluster-autoscaler starts around the two and a half minute mark, reaches 3000 containers around the six and a half minute mark, and 3500 containers around the seven minute mark. EKS on EC2 with Karpenter is faster: it starts around the same two and a half minute mark, but reaches 3000 containers a minute earlier, around the five and a half minute mark, and reaches 3500 containers around the same seven minute mark

Another enhancement we got this year is the support for IPv6, released in early 2022. There is no way to migrate an existing cluster, which means that for our testing we have to create a new IPv6 EKS cluster in an IPv6 VPC. You can see the full code used in the eks-on-ec2-ipv6 folder in the vlaaaaaaad/blog-scaling-containers-on-aws-in-2022 repository on Github.

As AWS pointed out in their announcement blog post, IPv6 reduces the work that the EKS' network plugin (amazon-vpc-cni-k8s) has to do, giving us a nice bump in scaling speed, for both Cluster Autoscaler and Karpenter:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. There are four lines, in two clusters. The first cluster is EKS on EC2 using Karpenter where we can see IPv6 is around 30 seconds faster than IPv4. The second cluster is EKS on EC2 using cluster-autoscaler where IPv6 is again faster than IPv4

I envision people will start migrating towards IPv6 and Karpenter, but that will be a slow migration: both of them are fundamental changes! Migrating from IPv4 to IPv6 requires a complete networking revamp, with multiple components and integrations being affected. Migrating from Cluster AutoScaler to Karpenter is easier as it can be done gradually and in-place (workloads that are a fit for Karpenter-only clusters are rare), but taking full advantage of Karpenter requires deeply understanding what resources applications need — no more putting applications in groups.

Now, let's move to serverless Kubernetes workers!
As mentioned above, Fargate is an alternative to EC2s: no more stressing about servers, operating systems, patches, and updates. We only have to care about the container image!

This year, EKS on Fargate is neck-and-neck with EKS on EC2 in terms of scaling:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. EKS on Fargate starts around the one-minute mark and reaches close to 3500 containers around the eight minute mark. EKS on EC2 with Karpenter and EKS on EC2 with cluster-autoscaler both start around the two minute mark, and reach 3500 containers around the seven minute mark, but EKS on EC2 with Karpenter scales faster initially

Looking at the above graph does not paint the full picture, though. If we look at the yearly evolution of EKS on Fargate, we see how much AWS improved this, without any effort required from the users:

Hand-drawn-style graph showing yearly evolution for EKS on Fargate: in 2020 it took about 55 minutes to reach 3500 containers. In 2021, it takes around 20 minutes, and in 2022 it takes a little over 8 minutes

Massive improvements! If we built and ran an application using EKS on Fargate in 2020, we would have scaled to 3 500 containers in about an hour. Without any effort or changes, in 2021 the scaling would be done in 20 minutes. Without having to invest any effort or change any lines of code, in 2022 the same application would finish scaling in less than 10 minutes! In less than 2 years, without any effort, we went from 1 hour to 10 minutes!

I do see people migrating from EKS on EC2 to EKS on Fargate, but small amounts. EKS on Fargate has no support for Spot pricing, which makes Fargate an expensive proposition when compared with EC2 Spot Instances. EC2 Spot instances (and ECS on Fargate Spot) are cheaper by up to 90%, but they may be interrupted with a 2-minute warning. For a lot of containerized workloads, that is not a problem: a container will be interrupted, and quickly replaced by a new container. No harm done, and way lower bills.

The most common setup I see for Kubernetes on AWS is using a combination of workers. EKS on Fargate is ideal for long-running and critical components like AWS Load Balancer Controller, Karpenter, or Cluster Autoscaler. EKS on EC2 is ideal for interruption-sensitive workloads like stateful applications. What I am seeing most often with my customers, is the largest part of the worker plane using EKS on EC2 Spot which is good enough for most applications.

Keep in mind that these are default performance results, with an extreme test case, and with manual scaling! Performance levels will differ depending on your applications and what setup you're running.

One would think the upper limit on performance is EKS on EC2 with the servers all ready to run containers — the servers won't have to be requested, created, and started, containers would just need to start:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. All the previous graphs are merges into this graph, in a mess of lines

That gets close, but even that could be optimized by having the container images already cached on the servers and by tuning the networking stack!
There are always optimizations to be done, and more performance can always be squeezed out. For example, I spent 3 months with a customer tuning, experimenting, and optimizing Cluster Autoscaler for their applications and their specific workload. After all the work was done, my customer's costs decreased by 20% while their end-users saw a 15% speed-up! For them and their scale, it was worth it. For other customers, spending this much time would represent a lot of wasted effort.

Always make sure to consider all the trade-offs when designing! Pre-scaled, pre-warmed, and finely tuned EKS on EC2 servers may be fast, but the associated costs will be huge — both pure AWS costs, but also development time costs, and missed opportunity costs 💸

Elastic Container Service scaling

Amazon's Elastic Container Service — the pole-position orchestrator that needed competition to become awesome.

Just like Kubernetes, Amazon Elastic Container Service or ECS for short, has its components divided in two sections: the Control Plane and the Worker Plane. The Control Plane is like the brain: it decides where containers run, what happens to them, and it talks to us. The Worker Plane is the servers on which our containers actually run.

For ECS, the Control Plane is proprietary: AWS built it, AWS runs it, AWS manages it, and AWS develops it further. As users, we can create a Control Plane by creating an ECS Cluster. That's it!

For the Worker Plane we have more options, with the first five options being the same as the options we had when using Kubernetes as the orchestrator:

self-managed workers. These are EC2 instances: we manage them, we configure them, we update them, we do everything
serverless workers. For the least amount of stress — not having to care about servers, operating systems, patches, and all that — we can use serverless workers through ECS on Fargate: we give AWS a container and we tell them to run it — AWS will run it on a server they manage, update, and control
fancy 5G workers through AWS Wavelength. These are also EC2 instances, but hosted by 5G networking providers for super-low latency
extended-region workers through AWS Local Zones. These are still EC2 instances, but hosted in popular cities for lower latency
your own workers running on a big AWS-managed server that you buy from AWS and install in your own datacenter though AWS Outposts
an extra option, exclusively for ECS: your own workers on your own hardware that is connected to AWS through ECS Anywhere

For our testing, we'll focus on the most common options, and we'll use both EC2 workers that we manage ourselves and AWS-managed serverless workers though Fargate.

To get visibility into what is happening on the clusters, we don't have to install anything, we just have to enable CloudWatch Container Insights on the ECS cluster.

Our application container will use its own dedicated AWS IAM Role and its own dedicated Security Group. Containers are not run by themselves — what happens if the container has an error or restarts, or during an update — but as part of larger concepts. In the Kubernetes world we have Deployments and in the ECS world we have Services, but both of them do the same work: they manage containers. For example, a Deployment or a Service of 30 containers will always try to make sure 30 containers are running. If a container has an error, it is restarted. If a container dies, it is replaced by a new container. If 30 containers have to be updated, the Service will handle the complex logic around replacing each of the 30 containers with new and updated containers. Each of the 30 containers are gradually replaced, always making sure at least 30 containers are running at all times. In 2021, we saw that ECS scales faster when multiple ECS Services are used, so our application's containers will be part of multiple Deployments, all in the same ECS Cluster. The ECS Task will be using 1 vCPU and 2 GBs and will have a single container: our test application.

There are multiple other configuration options, and if you want to see the full configuration used, you can check out the Terraform infrastructure code in the ecs-* folders in vlaaaaaaad/blog-scaling-containers-on-aws-in-2022 on GitHub.

Now, with the setup covered, let's get to testing! I ran all the tests between December 2021 and April 2022, using all the latest versions available at the time of testing.

In 2020, I did not test ECS on EC2 at all. In 2021, I tested ECS on EC2, but the performance was not great. This year, the announcement for improved capacity provider auto-scaling gave me hope, and I thought we should re-test ECS on EC2.

Scaling ECS clusters with EC2 workers as part of an AutoScaling Group managed by Capacity Providers is very similar to cluster-autoscaler from the Kubernetes word: based on demand, EC2 instances are added or removed from the cluster. Not enough space to run all the containers? New EC2 instances are added. Too many EC2 instances that are underutilized? Instances are cleanly removed from the cluster.

In 2021, we saw that ECS can scale a lot faster when using multiple Services — there's a bit of extra configuration that has to be done, but it's worth it. We first have to figure out what is the number of ECS Services that will scale the fastest:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers, with different number of services. They all start scaling around the two and a half minute mark, with ECS on EC2 with 1 Service reaching about 1600 containers after ten minutes, with 5 Services reaching 3500 containers after ten minutes, with 7 Services also reaching 3500 containers after ten minutes, and with 10 Services reaching 2500 containers after ten minutes

Now that we know the ideal number of Services, we can focus on tuning Capacity Providers. A very important setting for the Capacity Provider is the target capacity, which can be anything between 1% and 100%.
If we set a low target utilization of 30%, that means we are keeping 70% of the EC2 capacity free — ready for hosting new containers. That's awesome from a scaling performance perspective, but it's terrible from a cost perspective: we are overpaying by 70%! Using a larger target, of say 95% offers better cost efficiency (only 5% unused space), but it means scaling would be slower since we have to wait for new EC2 instances to be started. How does this impact scaling performance? How much slower would scaling be? To figure it out, let's test:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. ECS on EC2 with 5 Services and 30% Target Capacity starts scaling after about thirty seconds, and reaches 3500 containers just before the six minute mark. ECS on EC2 with 5 Services and 80% Target Capacity starts scaling around the two and a half minute mark and reaches 3500 containers after about ten minutes. ECS on EC2 with 5 Services and 95% Target Capacity starts scaling around the two and a half minute mark and reaches about 1600 containers after ten minutes of scaling

In the real world, I mostly see ECS on EC2 used when ECS on Fargate is not enough — for things like GPU support, high-bandwidth networking, and so on.

Let's move to ECS on Fargate — serverless containers!

As mentioned above, Fargate is an alternative to EC2s: no more stressing about servers, operating systems, patches, and updates. We only have to care about the container image!

Fargate differs massively between ECS and EKS! ECS is AWS-native and serverless by design, which means ECS on Fargate can move faster and it can fully utilize the power of Fargate. Besides the "default" Fargate — On-Demand Intel-based Fargate by its full name — ECS on Fargate also has support for Spot (up to 70% discount, but containers may be interrupted), ARM support (faster and cheaper than the default), Windows support, and additional storage options.

In 2021, we saw that ECS on Fargate can scale a lot faster when using multiple Services — there's a bit of extra configuration that has to be done, but it's worth it. We first have to figure out what is the number of ECS Services that will scale the fastest:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers, with different numbers of services. ECS on Fargate with 2 Services takes about eight minutes, with 3 Services it takes about six minutes, with 5 Services it takes around five minutes, and with 7 Services it takes the same around five minutes

For our test application, the ideal number seems to be 5 Services: our application needs to be split in 5 Services, each ECS Service launching and taking care of 700 containers. If we use less than 5 Services, performance is lower. If we use more than 5 Services, performance does not improve.

Amazing performance from ECS on Fargate! If we compare this year's best result with the results from previous years, we get a fabulous graph showing how much ECS on Fargate has evolved:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. In 2020 ECS on Fargate took about 55 minutes to reach 3500 containers. In 2021, it takes around 12 minutes, and in 2022 it takes a little over 5 minutes

If we built and ran an application using ECS on Fargate in 2020, we would have scaled to 3 500 containers in about 60 minutes. Without any effort or changes, in 2021 the scaling would be done in a little over 10 minutes. Without having to invest any effort or change any lines of code, in 2022 the same application would finish scaling in a little bit over 5 minutes! In less than 2 years, without any effort, we went from 1 hour to 5 minutes!

To further optimize our costs, we can run ECS on Fargate Spot which is discounted by up to 70%, but AWS can interrupt our containers with a 2-minute warning. For our testing, and for a lot of real-life workloads, we don't care if our containers get interrupted and then replaced by another container. The up 70% discount is… appealing, but AWS mentions that Spot might be slower due to additional work that has to happen on their end. Let's test and see if there's any impact:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. The lines for ECS on Fargate On-Demand using 5 Services and ECS on Fargate Spot using 5 Services are really close, with the Spot line having a small bump

What a surprise! As per AWS, "customers can expect a greater variability in performance when using ECS on Fargate Spot" and we are seeing exactly that: for the test I ran ECS on Fargate Spot was just a smidge faster than ECS on Fargate On-Demand. Spot performance and availability varies, so make sure to account for that when architecting!

That said, how sustained is this ECS on Fargate scaling performance? We can see that scaling happens super-slow as we get close to our target of 3 500 containers. Is that because there are only a few remaining containers to be started, or is it because we are hitting a performance limitation? Let's test what happens when we try to scale to 10 000 containers!

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. There are lines for ECS on Fargate with 2, 3, 5, and 7 Services, and they all scale super-fast to 3400-ish containers and then slow down. There is a tall line for ECS on Fargate to 10000 containers with 5 Services taking a close to ten minutes to scale, also slowing down when reaching the top

Ok, ECS on Fargate has awesome and sustained performance!

That is not all though! This year, we have even more options: ECS on Fargate ARM and ECS on Fargate Windows.

In late 2021, AWS announced ECS on Fargate ARM, which is taking advantage of AWS' Graviton 2 processors. ECS on Fargate ARM is both faster and cheaper than the "default" ECS on Fargate On-Demand which uses Intel processors. There is no option for ECS on Fargate ARM Spot right now, so ECS on Fargate Spot remains the most cost-effective option.

To run a container on ARM processors — be they AWS' Graviton processors or Apple's Silicon in the latest Macs — we have to build a container image for ARM architectures. In our case, this was easy: we add a single line to the Docker's build-and-push Action to build a multi-architecture image for both Intel and ARM processors: platforms: linux/amd64, linux/arm64. It's that easy! The image will then be pushed to ECR, which supports multi-architecture images since 2020.

With the image built and pushed, we can test how ECS on Fargate ARM scales:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. ECS on Fargate is a bit slower than ECS on Fargate ARM, with about 20 seconds of difference between the two

Interesting, ECS on Fargate ARM is faster! AWS' Graviton2 processors are faster, and I expected that to be visible in the application processing time, but I did not expect that would have an impact on scaling too. Thinking about it, it makes sense though — a faster processor would help with the container image extraction and application startup. Even better!

In the same late 2021, AWS announced ECS on Fargate Windows, which can run Windows Server containers. Since Windows has licensing fees, the pricing is a bit different: there is an extra OS licensing fee, and, while billing is still done per-second, there is a minimum of 15 minutes.

Some folks would dismiss ECS on Fargate Windows, but it is a major announcement! People that had to run specific Windows-only dependencies can now easily adopt containerized applications or, for the first time, run Windows containers serverlessly on AWS. Windows support is great news for folks doing complex .NET applications that cannot be moved to Linux: they can now move at way higher velocity!

To build Windows container images, we have to make a couple of changes.
First, we have to use a Windows base image for our container. For our Python web app it's easy: the official Python base images have Windows support since 2016. Unfortunately, Docker's build-and-push Action has no support for building Windows containers. To get the image built, we'll have to run the docker build commands manually. Since GitHub Actions has native support for Windows runners, this is straightforward.
Like all our images, the Windows container image is pushed to ECR which does support Windows images since 2017.

I tried running our application and it failed: the web server could not start. As of right now, our gunicorn web server does not support Windows. No worries, we can use a drop-in alternative: waitress! This will lead to a small difference in the test application code between Windows and Linux, but no fundamental changes.

Because I used Honeycomb for proper observability, I was able to discover there is one more thing we have to do for the best scaling results: push non-distributable artifacts! You can read the whole story on a guest post I wrote on the Honeycomb blog, but the short version is that for right now, Windows Container images are special and an extra configuration option has to be enabled.
Windows has complex licensing and the "base" container image is a non-distributable artifact: we are not allowed to distribute it! That means that when we build our container image in GitHub Action and then run docker push to upload our image to ECR, only some parts of the image will be pushed to ECR — the parts that we are allowed to share, which in our case are our application code and the dependencies for our app. If we open the AWS Console and look at our image, we will see that only 76 MB were pushed to ECR.
When ECS on Fargate wants to start a Windows container with our application, it has to first download the container image: our Python application and its dependencies totaling about 76MBs will be downloaded from ECR, but the base Windows Server image of about 2.7 GBs will be downloaded from Microsoft. Unless everything is perfect in the universe and on the Internet between AWS and Microsoft, download performance can vary wildly!
As per Steve Lasker, PM Architect Microsoft Azure, Microsoft recognizes this licensing constraint has caused frustration and is working to remove this constraint and default configuration. Until then, for the highest consistent performance, both AWS and Microsoft recommend setting a Docker daemon flag for private-use images: --allow-nondistributable-artifacts. By setting this flag, the full image totaling 2.8 GBs will be pushed to ECR — both the base image and our application code. When ECS on Fargate will have to download the container image, it will download the whole thing from close-by ECR.

With this extra flag set, and with the full imaged pushed to ECR, we can test ECS on Fargate Windows and get astonishing performance:

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers. ECS on Fargate and ECS on Fargate ARM both start around the 30 second mark and reach close to 3500 containers around the four to five minute mark. ECS on Fargate Windows starts just before the six minute mark, reaching 3500 containers in about eleven minutes

ECS on Fargate Windows is slower to start — about 5 minutes of delay compared to about 30 seconds of delay when using Linux containers — but that was expected: Fargate has to do licensing stuff and Windows containers are, for good reasons, bigger. After that initial delay, ECS on Fargate Windows is scaling just as fast, which is awesome!

I am seeing a massive migration to ECS on Fargate — it's so much easier!
Since ECS on Fargate launched in 2017, for the best-case scenario, approximative pricing per vCPU-hour got a whooping 76% reduction from $ 0.05 to $ 0.01 and pricing per GB-hour got a shocking 89% reduction from $ 0.010 to $ 0.001. Since I started testing in 2020, ECS on Fargate got 12 times faster!

While initially a slow and expensive service, ECS on Fargate grew into an outstanding service. I started advising my clients to start moving smaller applications to ECS on Fargate in late 2019, and that recommendation became stronger each year. I think ECS on Fargate should be the default choice for any new container deployments on AWS. As a bonus, running in other data-centers became a thing instantly, without any effort, through ECS Anywhere! That enables some amazingly easy cross-cloud SaaS scenarios and instant edge computing use-cases.

That said, ECS on Fargate is not an ideal fit for everything: there is limited support for CPU and memory size, limited storage support, no GPU support yet, no dedicated network bandwith, and so on. Using EC2 workers with ECS on EC2 offers a lot more flexibility and power — say servers with multiple TBs of memory, hundreds of CPUs, and dedicated network bandwidth in the range of 100s of Gigabits. It's always tradeoffs!

Again, keep in mind that these are default and forced performance results, with an extreme test case, and with manual scaling! Performance levels will differ depending on your applications and what setup you're running! This post on the AWS Containers Blog gets into more details, if you're curious.

Hand-drawn-style graph showing the scaling  performance from 0 to 3500 containers, using ECS. There are a lot of lines and it's messy

App Runner

App Runner is a higher-level service released in May 2021.

For App Runner, we don't see any complex Control Plane and Worker Plane separation: we tell App Runner to run our services, and it does that for us!
App Runner is an easier way of running containers, further building up on the experience offered by ECS on Fargate. If you're really curious how this works under the covers, an awesome deep-dive can be read here and AWS has a splendid networking deep-dive here.

For our use case, there is no need to create a VPC since App Runner does not require one (but App Runner can connect to resources in a VPC). To test App Runner, we create an App Runner service configured to run our 1 vCPU 2 GB container using our container image from ECR. That's it!

To force scaling, we can edit the "Minimum number of instances" to equal the "Maximum number of instances", and we quickly get the result:

Hand-drawn-style graph showing the scaling performance from 0 to 30 containers instead of 3500 containers. There are 2 lines for ECS on Fargate using 2 and 5 services: they both start around the 30 seconds mark and go straight up. App Runner has a shorter line that starts around the one minute mark, goes straight up, and stops abruptly at 25 containers

App Runner starts scaling a bit slower than ECS on Fargate, but then scales just as fast. Scaling finishes quickly, as App Runner supports a maximum of 25 containers per service. There is no way to run more than 25 containers per service, but multiple services could be used.

Don't be fooled by the seemingly small number! Each App Runner container can use a maximum of 2 vCPUs and 4 GB of RAM, for a grand total of 50 vCPUs and 100 GBs possible in a single service. For many applications, this is more than enough, and the advantage of AWS managing things is not to be underestimated!

In the future, I expect App Runner will continue to mature, and I think it might become the default way of running containers sometime in 2023 — but those are my hopes and dreams. With AWS managing capacity for App Runner, there are a lot of optimizations that AWS could implement. We'll see what the future brings!

Keep in mind that these are default performance results, with manual, forced scaling! Performance levels will differ depending on your applications and what setup you're running! App Runner also requires way less effort to setup and has some awesome scaling features 😉

Lambda

Lambda is different: it's Function-as-a-Service, not containers.

In previous years I saw no need to test Lambda as AWS publishes the exact speed at which Lambda scales. In our Northern Virginia (us-east-1) region, that is an initial burst of 3 000 instances in the first minute, and then 500 instances every minute after. No need to test when we know the exact results we are going to get: Lambda will scale to 3 500 instances in 2 minutes!
This year, based on countless requests, I thought we should include Lambda, even if only to confirm AWS' claims of performance. Lambda instances are not directly comparable with containers, but we'll get to that in a few.

Lambda is a fully managed event-driven Function-as-a-Service product. In plain language, when events happen (HTTP requests, messages put in a queue, a file gets created) Lambda will run code for us. We send Lambda the code, say "run it when X happens", and Lambda will do everything for us, without any Control Plane and Worker Plane separation — we don't even see the workers directly!

The code that Lambda will run can be packaged in two ways:

in a .zip archive, of at most 50 MB. This is the "classic" way of sending code to Lambda
in a container image (which, at its lowest level, is a collection of archives too), which can be as large as 10 GB. This was launched in late 2020 and is an alternative way of packaging the code

To adapt our application for Lambda, we have first to figure out what event our Lambda function will react to.
We could configure Lambda to have a bunch of containers waiting ready to accept traffic, but that is not the same thing as Lambda creating workers for us when traffic spikes. For accurate results, I think we need to have at least a semi-realistic scenario in place.
Looking at the many integrations Lambda has, I think the easiest one to use for our testing is the Amazon API Gateway integration: API Gateway will receive HTTP requests, run our code for each request, and return the response.

Our test application code can be adapted now that we know what will call our Lambda. I decided to re-write our test application. While Lambda can run whatever code we send it (including big frameworks with awesome performance), I see no need for a big framework for our use-case. I re-wrote the code, still in Python but without the Flask micro-framework, I packaged it into a .zip file, and configured it as the source for my Lambda function.

To scale the Lambda function, since there is no option of manually editing a number, we will have to create a lot of events: flood the API Gateway with a lot of HTTP requests. I did some experiments, and the easiest option seems to be Apache Benchmark running on multiple EC2 instances with a lot of sustained network bandwidth. We will run Apache Benchmark on each EC2 instance, and send a gigantic flood of requests to API Gateway, which will in turn send that to a lot of Lambdas.
Since Lambda publishes its scaling performance targets, we know that Lambda will scale to our target of 3 500 containers in just 2 minutes. That's not enough time — we have to scale even higher! When we wanted to confirm that ECS on Fargate has sustained scaling performance, we scaled to 10 000 containers in about 10 minutes. That seems like a good target, right? Let's scale Lambda!

There are multiple other configuration options, and if you want to see the full configuration used, you can check out the Terraform infrastructure code in the lambda folder in vlaaaaaaad/blog-scaling-containers-on-aws-in-2022 on GitHub.

Now, with the setup covered, let's get to testing! I ran all the tests between January and February 2022, using all the latest versions available at the time of testing.

Hand-drawn-style graph showing the scaling performance from 0 to 10000 containers. ECS on Fargate starts around the 30 seconds mark and grows smoothly until 10000 around the ten minute mark. The Lambda line spikes instantly to 3000 containers, and then spikes again to 3500 containers. After that, the Lambda line follows a stair pattern, every minutes spiking an additional 500 containers

Lambda followed the advertised performance to the letter: an initial burst of 3  000 containers in the first minute, and 500 containers each minute after. I am surprised by Lambda scaling in steps — I expected the scaling to be spread over the minutes, not to have those spikes when each minute starts.

Funnily enough, at about 7 minutes after the scaling command, both Lambda and ECS on Fargate were running almost the same number of containers: 6 500, give or take. That does not tell us much though — these are pure containers launched. How would this work in the real world? How would this impact our applications? How would this scaling impact response times and customer happiness?

Lambda and ECS on Fargate work in different ways and direct comparisons between the two cannot be easily done.
Lambda has certain container sizes and ECS on Fargate has other container sizes with even more sizes in the works. Lambda has one pricing for x86 and one pricing for ARM and ECS on Fargate has two pricing options for x86 and one pricing option for ARM. Lambda is more managed and tightly integrated with AWS which means, for example, that there is no need to write code to receive a web request, or to get a message from an SQS Queue. But ECS on Fargate can more easily use mature and validated frameworks. Lambda can only do 1 HTTP request per container, while ECS on Fargate can do as many as the application can handle, but that means more complex configuration. And so on and so forth.
I won't even attempt to compare them here.

And that is not all — it gets worse/better!
We all know that ECS on Fargate and EKS on Fargate limits can be increased, and we even saw that in the 2021 tests. In the middle of testing Lambda, I got some surprising news: Lambda's default scaling limits of 3 000 burst and 500 sustained can be increased!
Unlike the previous limit increases (which were actually capacity limit increases) this is a performance limit increase and is not a straightforward limit increase request. It's not something that only the top 1% of the top 1% of AWS customers can achieve, but it's not something that is easily available either. With a legitimate workflow and some conversations with the AWS teams and engineers through AWS Support, I was able to get my Lambda limits increased by a shocking amount: from the default initial burst of 3 000 and a sustained rate of 500, my limits were increased to an initial burst of 15 000 instances, and then a sustained rate of 3 000 instances each minute 🤯

To see these increased limits in action, we have to scale even higher. A test to 10 000 containers is useless:

Hand-drawn-style graph showing the scaling performance from 0 to 10000 containers. The same graph as before, with an additional line going straight up from 0 to 10000, instantly

Did you notice the vertical line? Yeah, scaling to 10 000 is not a great benchmark when Lambda has increased limits to burst to 15 000.

To properly test Lambda with increased limits, we have to go even higher! If we want to scale for the same 10 minute duration, we have to scale up to 50 000 containers!
At this scale, things start getting complicated. To support this many requests, we also have to increase the API Gateway traffic limits. I talked to AWS Support and after validating my workflow, we got the Throttle quota per account, per Region across HTTP APIs, REST APIs, WebSocket APIs, and WebSocket callback APIs limit increased from the default 10 000 to our required 50 000.

With the extra limits increased for our setup, we can run our test and see the results. Prepare to scroll for a while:

Hand-drawn-style graph showing the scaling performance form 0 to 50000 containers. It is a comically tall graph, requiring a lot of scrolling. ECS on Fargate starts around the 30 seconds mark and grows smoothly until 10000 around the same ten minute mark. There is an additional line for Lambda with increased limits which goes straight up to 12000-ish containers and then spikes again to 18000-ish. After that, the line follows the same stair pattern, every minute spiking an additional 3000 containers

Yeah… that is a lot.

We are about large numbers here: 3 500, 10 000, and now 50 000 containers. We are getting desensitized, and I think it would help to put these numbers in perspective. The biggest Lambda size is 6 vCPUs with 10 GB of memory.
With the default limits, Lambda scales to 3 000 containers in a couple of seconds. That means that with default limits, we get 30 TB of memory and 18 000 vCPUs in a couple of seconds.
With a legitimate workload and increased limits, as we just saw, we are now living in a world where we can instantly get 150 TB of RAM and 90 000 vCPUs for our apps 🤯

Acknowledgments

First, massive thanks go to Farrah and Massimo, and everybody at AWS that helped with this! You are all awesome and the time and care you took to answer all my annoying emails is much appreciated!
I also want to thank Steve Lasker and the nice folks at Microsoft that answered my questions around Windows containers!

Special thanks go to all my friends who helped with this — from friends being interviewed about what they would like to see and their scaling use-cases, all the way to friends reading rough drafts and giving feedback. Thank you all so much!

Because I value transparency, and because "oh, just write a blog post" fills me with anger, here are a couple of stats:

this whole thing took almost 6 months:
- November 2021: early discovery and calls to figure out how to test, showcase, and contextualize Lambda and App Runner
- December 2021: continued discovery, coding, testing
- January 2022: continued testing, data visualization explorations
- February 2022: final testing, early drafts, and data visualization explorations
- March 2022: a lot of writing and reviews
- April 2022: final reviews, re-testing
final results from the tests were exported to over 80 MB of spreadsheets, from about 5 GB of raw data. That's 80 MB of CSVs and 5 GBs of compressed JSONs, not counting the discarded data from experiments or failed tests
over 200 emails were sent between me and more than 30 engineers at AWS. Again, thank you all so much, and I apologize for the spam!
the total AWS bill was about 7.000 $. Thank you Farrah and AWS for the Credits!
more than 300 000 containers were launched
one major bug was discovered
carbon emissions are TBD