Introducing DataZone’s End-to-End Data Lineage Visualization Feature (Preview) on Amazon Web Services

Spread the love



Amazon Data Zone is a data management service that allows users to catalog, discover, analyze, share, and govern data within an organization. The service provides a unified data portal for engineers, data scientists, product managers, analysts, and business users to access data from different sources for collaborative data-driven insights.

One of the new features within Amazon Data Zone is data lineage, which offers an end-to-end view of data movement over time. This capability helps users visualize and understand data provenance, track change management, perform root cause analysis for data errors, and answer questions about data movement from source to target. The data lineage feature automatically captures lineage events from the Amazon Data Zone catalog and external events outside the platform, stitching them together to form a comprehensive view of data lineage.

Manually documenting data origins and relationships can be time-consuming and lead to inconsistencies, impacting users’ confidence in the data. Data lineage in Amazon Data Zone builds trust by showing where the data originated, how it changed, and how it was consumed over time. A programmatically configured data lineage can display data from its raw form in Amazon S3, through ETL transformations using AWS Glue, until it is consumed in tools like Amazon QuickSight.

By using data lineage in Amazon Data Zone, users can save time on mapping data assets and relationships, troubleshooting pipelines, and implementing data governance practices. The feature brings all lineage information together through APIs, providing a graphical view to help users make better decisions, identify data issues, and understand the root cause of problems.

To get started with data lineage in Amazon Data Zone, users can programmatically hydrate lineage information using the platform’s API or by sending OpenLineage events from existing pipeline components. The system automatically captures lineage states and subscriptions for producers and consumers to track data usage and relationships. As new lineage information is submitted, the system maps identifiers to assets in the catalog and creates versions to view different states over time.

Users can interact with data lineage through an interactive visualization that allows them to explore relationships at the asset, process, or job level. Different roles within an organization, such as marketing analysts, data engineers, administrators, or managers, can benefit from data lineage to understand data origins, impact of changes, and transformations that occur during data movement.

Data lineage is available in preview in all regions where Amazon Data Zone is generally available, with costs depending on storage usage and API requests included in the platform’s pricing model. For more information about data lineage in Amazon Data Zone, users can refer to the Amazon Data Zone User Guide.

Article Source
https://aws.amazon.com/blogs/aws/introducing-end-to-end-data-lineage-preview-visualization-in-amazon-datazone/