Leverage Data Preparation in AWS Glue Studio to Integrate Data and Enhance Collaboration | Amazon Web Services

Leverage Data Preparation in AWS Glue Studio to Integrate Data and Enhance Collaboration | Amazon Web Services



AWS Glue Studio is announcing the general availability of data preparation creation in AWS Glue Studio Visual ETL. This new feature provides a no-code data prep user experience for business users and data analysts, allowing them to run data integration jobs at scale in AWS Glue for Spark. The visual data prep experience aims to simplify data cleaning and transformation tasks for data scientists and analysts in preparation for analytics and machine learning.

Users can utilize hundreds of pre-built transformations to automate data prep tasks without the need to write any code. Business analysts can collaborate with data engineers in creating data integration jobs, with data engineers using Glue Studio’s visual flow-based view to define connections to data and set the order of the data flow process. Additionally, users can import existing data preparation and cleaning “recipes” from AWS Glue DataBrew into the new AWS Glue data prep experience.

The prerequisites for setting up Visual ETL include having the AWSGlueConsoleFullAccess IAM managed policy attached to the users and roles that will access AWS Glue. This policy grants full access to AWS Glue and read access to Amazon S3 resources. Once the appropriate IAM Role permissions are defined, users can create the visual ETL using AWS Glue Studio.

The process involves extracting data by creating an Amazon S3 node, selecting the S3 dataset, configuring the source node, and previewing the data in the .csv file. Users are then able to transform the data by adding a data preparation recipe, starting a data preview session, and authoring the recipe to apply transformation steps interactively.

Users can share their prepared data with data engineers who can further enhance it with advanced visual ETL flows and custom code for integration into production data pipelines. The AWS Glue data prep creation experience is now available on all commercial platforms in AWS Regions where AWS DataBrew is available.

For more information, users can visit the AWS Glue Developer Guide and provide feedback through AWS re:Post for AWS Glue or their usual AWS support contacts.

Article Source
https://aws.amazon.com/blogs/aws/integrate-your-data-and-collaborate-using-data-preparation-in-aws-glue-studio/