The only Data Catalog with Automated Data Fingerprinting

Download the Free Sandbox

There are two major aspects to the data cataloging process. The first is to help data professionals discover, organize and curate data and the second is to expose the newly organized data for business professionals to use.

Waterline Data Fingerprinting™ automates this process, making it easy to connect the right people to the right data.

How it works

Data Professionals



Automatically and incrementally “fingerprint” data at scale by analyzing source data



Use machine learning to automatically tag and match data fingerprints to glossary terms. Match the unmatched terms through crowdsourcing



Human review accepts or rejects tags and automates data access control via tag based security

Business Professionals



Search for data through the Waterline GUI or through integration via 3rd party applications



Use objective profiling information along with subjective crowdsourced input to rate data quality



Crowdsource annotations and ratings to collaborate and share “tribal knowledge” about your data

Get value in days, not weeks or months

You have thousands of datasets with millions of distinct data fields across your company and that number is growing every day. Manually documenting your catalog isn’t an option! Waterline Data automatically catalogs all your data assets so you get value from your data catalog right out of the box.

Reduces manual tagging of data by over 80%. Waterline Data Fingerprinting™ combines big data analysis, machine learning and human curation to automatically catalog data and data lineage at scale

Data stewards accept/reject automatically suggested tags and the system learns, fine tunes and improves the matching algorithm

Data Fingerprinting - Data Catalog

Works natively on Hadoop and Spark to easily scale to handle all your data

Works seamlessly across a wide variety of data sources (relational, files, Hadoop, etc.) because you never know where the most important data is located

Self-service accelerates time to value

You’re a business professional and when you have questions, you need reliable answers, but where is the right data, and who do you ask? Waterline consolidates your tribal data knowledge and makes it easy to share with others so you and your colleagues can quickly find the data you need.

Easy to use web search interface designed specifically for the business user to search a catalog of trusted, curated data

Search directly from your existing data wrangling and visualization tools integrated through our REST APIs

Self Service - Waterline data Catalog

Add annotations and view the comments of other users to capture tribal knowledge and establish trusted data sources

Automatically propagates data tags so users can easily find similar data

Govern your data with agility

Data Governance isn’t one size fits all. We provide the appropriate level of governance for whatever type of data is being managed.

Simplifies data governance by delivering a truly scalable, automated and repeatable process for identifying sensitive data, capturing data lineage, and ensuring proper data use and access

Ensures proper data access for sensitive data and integrates directly with Apache Ranger and Cloudera Sentry to enable tag based access control

Data Lineage

Allows data stewards to manage tagging rules, curate the data catalog, and manage proper access to data

Sensitive data tags are automatically propagated to other similar data in other locations using the Waterline Data Fingerprinting™ technology

Supported Platforms


Ready to unlock the value of your data?

Download the Sandbox