At AWS re:Invent 2025, Amazon Web Services (AWS) announced serverless storage for Amazon EMR Serverless, a new capability that eliminates the need configure local disks for Apache Spark workloads. This reduces data processing costs by up to 20% while eliminating job failures from disk capacity constraints.
With serverless storage, Amazon EMR Serverless automatically handles intermediate data operations, such as shuffle, on your behalf. You pay only for compute and memory—no storage charges. By decoupling storage from compute, Spark can release idle workers immediately, reducing costs throughout the job lifecycle. The following image shows the serverless storage for EMR Serverless announcement from the AWS re:Invent 2025 keynote:
The challenge: Sizing local disk storage
Running Apache Spark workloads requires sizing local disk storage for shuffle operations—where Spark redistributes data across executors during joins, aggregations, and sorts….

