Data Lineage Overview
Using data without knowing its lineage is risky
Data lineage traces the origins, movements, and joins of your data to provide insight into its quality. Misunderstanding data can result in using the wrong data, which leads you to suboptimal decision-making. Data lineage tools often use a graphical interface to show the data’s journey, from inception to how it’s used (ETL, databases, business intelligence, etc.); its dependencies, to where it’s joined with other data, to whether or not it has been changed or updated. Data lineage tools give you more control over your data by allowing error tracking and adjustments when needed. Also, these tools can facilitate process changes, metadata management, and data governance.
Most data lineage tools show the path of a data set only while that data is in the purview of the tool. Waterline’s AI-driven Data Catalog data lineage path includes:
The Missing Link - Incomplete Data Lineage
Often User Defined and Imported data lineage, the lineage we get from human knowledge and from established IT processes is incomplete.
The Waterline Data AI-driven data catalog maps your entire data estate. First, it imports data lineage from Apache Atlas and Cloudera Navigator. Second, it includes user-defined data lineage while importing. Third, the data catalog augments incomplete data lineage with inferred lineage, an area where traditional lineage tools fall short. Finally, it reveals “dark data” lineage. Every organization has dark data lying around. It has accreted over time, and no one remembers how it got there or what it is. Organizations can either undertake time-consuming, manual investigations or leverage automated (and much faster!) inferred data lineage tools that establish the data’s validity or inaccuracies.
Waterline Data’s heuristic data-centric approach displays lineage so you can easily see what lineage is inferred or factual based on imported and user-defined sources.
Metadata and Data Lineage Interdependence
While metadata management allows you to find pertinent data, including ETL processes, or subject matters, data lineage enables users to understand the quality and value of any given data. Data lineage in isolation doesn’t give enough information for you to utilize any datasets. It is only when data lineage is paired with metadata management, search capabilities, peer reviews, etc. that you can you responsibly utilize any company data. While data lineage is only one component of determining a data set viability, it’s a big component.
The best analogy for understanding the importance of a data lineage tool is using the internet itself. One can search the internet for Bigfoot sightings, a flat Earth, and the date of the next apocalypse. Yet a reader can’t judge the validity of web content without understanding where it came from. One web site usually references another site that references another site. Imagine the value in judging the validity of any site’s claim by quickly seeing the lineage of unconventional claim.
It’s exactly the same concept when searching an organization’s data lake. You can search an organization’s data, but unless you know where the data came from, where and how it’s used, and its other characteristics, basing decisions off of it is a gamble.
Collaboration with Data Catalog
Waterline Data’s AI-Driven Data Catalog adds even more smart information by enabling users to make intelligent decisions on data use. Users can includes peer added comments, write reviews, and add ratings of each data set to further inform their organization of the legitimacy and efficacy of each data set.
Improving your data lineage operations can mean the difference between basing business decisions on false, misleading or factual data. Waterline Data enables you to make decisions based on data that you can trust.
Data lineage analysis is only one key component to validate the quality of any data. Unless data lineage is paired with other key decision criteria, users and their related organizations could be making sub-optimal decisions.
Read more about Waterline Data’s Fingerprinting technology inferring data lineage here.