Every day, AWS customers process millions of compressed files in Amazon S3, from small ZIP archives to multi-gigabyte datasets. While decompressing a single file is straightforward, processing thousands of files efficiently requires complex orchestration, error handling, and infrastructure management.
Consider this scenario: Your organization receives over 10,000 compressed files daily from partners, ranging from 5 MB to 50 GB in size. Traditional approaches force you to choose between downloading files locally with bandwidth constraints and storage limits, running always-on Amazon EC2 instances with unnecessary costs, writing custom AWS Lambda functions limited by 10 GB temporary storage and 15-minute timeouts, or managing complex orchestration code with significant maintenance overhead.
This post presents a serverless solution that uses AWS Step Functions to automatically route files to optimal compute resources, using Lambda for files under 1 GB and EC2 for…