Harness the Power of Genomic Data Analysis with AWS HealthOmics Analytics and Amazon EMR | Amazon Web Services

Harness the Power of Genomic Data Analysis with AWS HealthOmics Analytics and Amazon EMR | Amazon Web Services



The latest release of Amazon EMR now integrates with AWS HealthOmics, making it easier to analyze large-scale genomic variant data. HealthOmics Analytics provides tabular access to genetic variant data sets and annotations, allowing researchers and data scientists to uncover valuable insights through Amazon Athena queries and EMR jobs. EMR offers a flexible big data platform for processing vast amounts of data using frameworks like Apache Spark, providing greater flexibility and control over compute environments and analytics workflows. By combining HealthOmics Analytics Stores with EMR, users can perform various analyses, including genotype-phenotype association analysis and population-scale variant analysis.

To get started with querying genomic data, users need to create a HealthOmics Variant and/or Annotation store and import data into it. Setting up permissions in IAM is essential, including creating default EMR roles like EMR_EC2_DefaultRole and EMR_DefaultRole. These roles need specific inline policies to interact with Lake Formation and access genetic data securely. Additionally, users need to create a data lake administrator in Lake Formation, configure permissions settings, and grant permissions to access genomic data.

Once permissions are configured, users can create an EMR cluster with the necessary configurations, such as using Amazon EMR Release 6.13.0 or higher and Spark 3.4.1 or greater. Users can SSH into the EMR cluster and run queries against the HealthOmics Analytics Store using the SparkSQL shell. Queries can be performed to describe the table and count the number of variants based on specific criteria.

With the release of EMR 6.13.0, researchers and data scientists can now leverage HealthOmics Analytics Data in their EMR Cluster, unlocking more use cases and scaling options for genomic analytics. Users can deploy EMR in various ways, such as with EC2, EKS, or Outpost, and check the EMR pricing page for cost estimates. There are no additional costs associated with querying data in the HealthOmics Analytics Store. Lastly, users should remember to clean up resources by terminating the EMR cluster after use.

Article Source
https://aws.amazon.com/blogs/industries/unlock-powerful-genomic-insights-with-aws-healthomics-analytics-and-amazon-emr/