State of Cloud Costs

STATE OF CLOUD COSTS

/ / / / / / / / / / / / / / / / /

Organizations face significant challenges in increasing the efficiency of their growing cloud spending, even as the flexibility and variety of available cloud services offer many opportunities for optimization. Cloud environments are complex and dynamic due to the breadth of services and the drive to adopt new technologies, such as Arm-based processors and GPUs that enable AI capabilities. These complexities make it difficult for organizations to fully understand the factors contributing to their cloud costs.

For this report, we analyzed AWS cloud cost data from hundreds of organizations. We explored how their use of emerging and previous-generation technologies, patterns of cloud resource usage, and participation in AWS discount programs all contribute to their cloud costs. Our findings suggest that while organizations have opportunities for cost optimization in each of these dimensions, identifying and attaining those gains can be challenging in a complex and ephemeral landscape.


Fact 1

Spending on GPU instances now makes up 14 percent of compute costs

Organizations that use GPU instances have increased their average spending on those instances by 40 percent—from 10 percent of their EC2 compute costs to 14 percent—in the last year. GPUs’ capacity for parallel processing makes them critical for training LLMs and executing other AI workloads, where they can be more than 200 percent faster than CPUs.

GPU-based EC2 instance types generally cost more than instances that don’t use GPUs. But the most widely used type—the G4dn, used by 74 percent of GPU adopters—is also the least expensive. This suggests that many customers are experimenting with AI, applying the G4dn to their early efforts in adaptive AI, machine learning (ML) inference, and small-scale training. We expect that as these organizations expand their AI activities and move them into production, they will be spending a larger proportion of their cloud compute budget on GPU.

Organizations now spend an average of 14 percent of their EC2 compute costs on GPU instances
Fact 2

Arm spending as a proportion of compute costs has doubled in the past year

We observed that on average, organizations that use Arm-based instances spend 18 percent of their EC2 compute costs on them—twice as much as they did a year ago. Instance types based on the Arm processor use up to 60 percent less energy than similar EC2s and often provide better performance at a lower cost.

The most common type of Arm-based instance we see in use is T4g, which is used by about 65 percent of organizations. These instances are powered by Graviton2 processors and provide up to 40 percent better price performance than their x86-64-based T3 counterparts.

Arm-based instances still account for only a minority of EC2 compute spending, but the increase we’ve seen over the last year has been steady and sustained. This looks to us as if organizations are beginning to update their applications and take advantage of more efficient processors to slow the growth of their compute spend overall.

Organizations spend twice as much of their compute allocation on Arm compared to one year ago
Fact 3

Container costs comprise one third of EC2 spend

Organizations use about 35 percent of their EC2 compute spend for running containers, up from 30 percent a year ago. This includes EC2 instances deployed as Kubernetes control or worker nodes in self-managed clusters, as well as instances that run in ECS and EKS clusters. Across all of the customers we analyzed, about one quarter allocate more than 75 percent of their EC2 spend to run containers.

On average, organizations spend over a third of their EC2 costs running containers

We expect to see continued growth in the proportion of cloud spend allocated to containers as organizations increasingly benefit from the associated efficiencies, including ​​streamlined deployments, improved dependency management, and more efficient use of infrastructure. But they’ll also be challenged to manage the added complexity of attributing costs based on ephemeral, shared infrastructure and provisioning container infrastructure in a cost-efficient way.

Fact 4

More than 80 percent of container spend is wasted on idle resources

Our research shows that 83 percent of container costs are associated with idle resources. About 54 percent of this wasted spend is on cluster idle, which is the cost of overprovisioning cluster infrastructure. The remaining 29 percent is associated with workload idle, which comes from resource requests that are larger than their workloads require.

Idle resources account for a majority of container costs

We don’t expect that wasted container spend can be eliminated entirely. It’s difficult for development teams to accurately forecast each new application’s resource requirements, and this makes it difficult to efficiently allocate those resources. And resource needs often change based on the nature and utilization of the workloads. Organizations can autoscale their cluster infrastructure and individual workloads, but autoscaling is complex—teams can optimize scaling parameters based on workload traffic patterns, but efficiency improvements are often marginal and elusive.

Fact 5

Previous-generation technologies are still widely used

AWS’s current infrastructure offerings commonly both outperform their previous-generation versions and cost less. However, our data shows that—while organizations are making efforts to modernize—in the case of EC2 instance types and EBS volume types, the older technologies still have a significant presence in many of their environments.

We found that 83 percent of organizations still use previous-generation EC2 instance types, which is down from 89 percent one year ago. These organizations spend on average about 17 percent of their EC2 budget on them.

Most organizations still use previous-generation EC2 instance types

In the case of EBS, the current generation of volumes—gp3—cost about 20 percent less than gp2 volumes, but organizations still spend more on the older volumes. The costs of gp2 volumes represent 58 percent of the average organization’s EBS spend, decreased from 68 percent a year ago.

Organizations spend more of their EBS budget on gp2 than gp3.

While we expect to continue to see gp2 volumes in use for the near future, we predict that organizations will gradually decrease their reliance on them over time. The challenges of migrating—including the complexity of moving large volumes of data, the required cross-team collaboration, and the difficulty predicting how workloads will perform on newer generation technologies—all contribute to the slow rate of adoption. However, the cost reductions and performance gains offered by newer EC2 and EBS versions—and indeed even newer technologies in the future—will continue to be a motivation to migrate.

Fact 6

Cross-AZ traffic makes up half of data transfer costs

Our research found that, on average, organizations spend almost as much on sending data from one availability zone (AZ) to another as they do on all other types of data transfer combined—including VPNs, gateways, ingress, and egress. Cross-AZ traffic may be unavoidable in some scenarios, such as when an application’s high-availability architecture requires that instances be deployed in more than one AZ. It may also be an inevitable side effect of organizational changes that come as teams, services, and applications scale.

Wherever the costs come from, their impact is substantial: 98 percent of organizations are affected by cross-AZ charges. This may indicate a near-universal opportunity to optimize cloud costs, such as by colocating related resources within a single AZ whenever availability requirements allow.

In some cases, cloud providers have stopped charging for certain types of data transfer. It’s difficult to predict how these changes might evolve, but if providers relax data transfer costs further, future cross-AZ traffic may become less of a factor in cloud cost efficiency.

Cross-AZ traffic makes up nearly half of data transfer costs
Fact 7

A decreasing percentage of organizations use commitment-based discounts

Cloud service providers offer discounts on many of their services—for example, AWS has discount programs for Amazon EC2, Amazon RDS, Amazon SageMaker, and others. Most organizations opt in to these programs, committing to a certain amount of future spend or usage of the service. But our data shows a decreasing proportion of organizations participating—67 percent, compared to 72 percent last year.

Adoption of commitment-based discount programs is high, but decreasing

Further, we see relatively low engagement in these discount programs—only 29 percent of organizations purchase enough discounts to cover more than half of their eligible cloud spend. This underutilization of discounts suggests that organizations are hesitant to commit up front to a specific amount of usage or spend, possibly due to difficulty in forecasting their resource needs confidently enough to commit to ongoing usage. They also may also face difficulty making discount-purchasing decisions due to lack of clarity on which teams are responsible for these decisions and who owns the affected resources. We see an opportunity for optimization, where most organizations can leverage discounts to improve cost efficiency as they gain a fuller understanding of the usage patterns behind their cloud costs.

Most organizations don't purchase enough discounts to cover even half of their eligible spend
Fact 8

More than four times as many organizations use Savings Plans vs. Reserved Instances

AWS users have two options for discounting their EC2 costs: Savings Plans—in which customers commit to a certain amount of EC2 spend, and Reserved Instances—in which they commit to an amount of usage of a specific instance type in a specific availability zone. Savings Plans are more flexible, and we found that most organizations—59 percent—take advantage of this and apply Savings Plans to at least some of their EC2 costs. Far fewer organizations use Reserved Instances—just 15 percent. This could suggest that organizations are more confident in knowing how much they’ll need to spend on EC2 than they are about which instance types they’ll need to deploy and where.

Savings Plans are more widely used than Reserved Instances

Take ownership of your cloud costs with Datadog.

Methodology

Findings are based on data collected between May 2023 and April 2024.

Population

For this report, we compiled cloud cost data from a sample of organizations that have used Datadog Cloud Cost Management to analyze their AWS bill.

Fact 1

We calculated the percentage of each organization's monthly amortized EC2 spend that was used to run GPU instances and averaged that cost across all organizations that had any amount of EC2 compute spend.

Fact 2

We analyzed data from all organizations that ran EC2 compute instances in the months shown. We calculated the percentage of their monthly amortized EC2 spend that was used to run any instances based on the following instance types: A1, C6g, C7g, G5g, Hpc7g, I4g, Im4gn, Is4gen, M6g, M7g, R6g, R7g, T4g, X2gd.

Fact 3

We calculated the percentage of each organization's monthly amortized EC2 spend that was used to run containerized EC2 instances—those that allocate any portion of their CPU or memory to containers—and averaged that cost across all organizations that had any amount of EC2 compute spend.

Fact 4

We calculated the proportions of workload idle, cluster idle, and utilized spend on containerized EC2 instances and averaged those proportions across all organizations running containerized instances.

Fact 5

In the first graph, the percentage value for each month represents the proportion of organizations using EC2 instances that spent any amount in that month on any of the following instance types:

  • C1, C3, C4, Cr1
  • D2
  • G2
  • Hs1
  • I2
  • M1, M2, M3, M4
  • R3, R4
  • T1, T2
  • X1

In the second graph, the percentages of gp2 and gp3 spend are averaged across all organizations that spent any amount on EBS during the months shown.

Fact 6

For this fact, we averaged the proportion of different types of transfer charges across all organizations that incurred amortized data transfer costs in April 2024. We used values from the aws_datatransfer_type tag—which Datadog adds automatically based on values in the organization's Cost and Usage Report (CUR)—to determine cross-AZ, cross-region, internet, and within-zone transfer costs.

Fact 7

The first graph analyzes data from all organizations that spent any amount on any of the following AWS products eligible for discount programs: Amazon EC2, AWS Lambda, AWS Fargate, Amazon SageMaker, Amazon RDS, Amazon ElastiCache, Amazon DynamoDB, Amazon Redshift, Amazon Neptune, Amazon Elasticsearch, and Amazon MemoryDB. Each month’s data shows the percentage of those organizations whose cloud bill includes any of the following line items during the months shown: SavingsPlanRecurringFee, SavingsPlanUpfrontFee, Fee, RIFee, DiscountedUsage, SavingsPlanCoveredUsage.

The second graph shows data from organizations that spent any amount on products eligible for discount programs to show the average percentage of spend on these products that was covered by a discount program.

Fact 8

This fact shows the proportion of organizations using Saving Plans and Reserved Instances as a percentage of the total number of organizations using EC2 instances. These discount programs are not exclusive, and an organization can participate in both of them.

Licensing

Report: CC BY-ND 4.0

Images: CC BY-ND 4.0