Secure Apache Spark writes to Amazon S3 on Amazon EMR with dynamic AWS KMS encryption | Amazon Web Services

Secure Apache Spark writes to Amazon S3 on Amazon EMR with dynamic AWS KMS encryption | Amazon Web Services

When processing data at scale, many organizations use Apache Spark on Amazon EMR to run shared clusters that handle workloads across tenants, business units, or classification levels. In such multi-tenant environments, different datasets often require distinct AWS Key Management Service (AWS KMS) keys to enforce strict access controls and meet compliance requirements. At the same time, operational efficiency might drive these organizations to consolidate their data pipelines. Instead of running separate Spark jobs for each dataset, it could be more efficient to run a single job on Amazon EMR that processes inputs once and writes multiple outputs to Amazon Simple Storage Service (Amazon S3), each encrypted with its own KMS key.

Although consolidating multiple datasets in one Spark job reduces orchestration overhead and simplifies code maintenance, you might encounter challenges with encryption configurations. By default, the EMRFS and S3A file system clients cache…

https://aws.amazon.com/blogs/big-data/secure-apache-spark-writes-to-amazon-s3-on-amazon-emr-with-dynamic-aws-kms-encryption/