There are two major aspects to the data cataloging process. The first is to help data professionals discover, organize and curate data and the second is to expose the newly organized data for business professionals to use.
Waterline Data Fingerprinting™ automates this process, making it easy to connect the right people to the right data.
How it works
Automatically and incrementally “fingerprint” data at scale by analyzing source data
Use machine learning to automatically tag and match data fingerprints to glossary terms. Match the unmatched terms through crowdsourcing
Human review accepts or rejects tags and automates data access control via tag based security
Search for data through the Waterline GUI or through integration via 3rd party applications
Use objective profiling information along with subjective crowdsourced input to rate data quality
Crowdsource annotations and ratings to collaborate and share “tribal knowledge” about your data
Get value in days, not weeks or months
You have thousands of datasets with millions of distinct data fields across your company and that number is growing every day. Manually documenting your catalog isn’t an option! Waterline Data automatically catalogs all your data assets so you get value from your data catalog right out of the box.
Reduces manual tagging of data by over 80%. Waterline Data Fingerprinting™ combines big data analysis, machine learning and human curation to automatically catalog data and data lineage at scale
Data stewards accept/reject automatically suggested tags and the system learns, fine tunes and improves the matching algorithm
Works natively on Hadoop and Spark to easily scale to handle all your data
Works seamlessly across a wide variety of data sources (relational, files, Hadoop, etc.) because you never know where the most important data is located
Self-service accelerates time to value
You’re a business professional and when you have questions, you need reliable answers, but where is the right data, and who do you ask? Waterline consolidates your tribal data knowledge and makes it easy to share with others so you and your colleagues can quickly find the data you need.
Easy to use web search interface designed specifically for the business user to search a catalog of trusted, curated data
Search directly from your existing data wrangling and visualization tools integrated through our REST APIs
Add annotations and view the comments of other users to capture tribal knowledge and establish trusted data sources
Automatically propagates data tags so users can easily find similar data
Govern your data with agility
Data Governance isn’t one size fits all. We provide the appropriate level of governance for whatever type of data is being managed.
Simplifies data governance by delivering a truly scalable, automated and repeatable process for identifying sensitive data, capturing data lineage, and ensuring proper data use and access
Ensures proper data access for sensitive data and integrates directly with Apache Ranger and Cloudera Sentry to enable tag based access control
Allows data stewards to manage tagging rules, curate the data catalog, and manage proper access to data
Sensitive data tags are automatically propagated to other similar data in other locations using the Waterline Data Fingerprinting™ technology