Connect the right people to the right data
Help data professionals discover, organize and curate data
Expose the newly organized data for business professionals to use
Identify sensitive data to better comply with data regulations like GDPR
How It Works
Automatically and incrementally “fingerprint” data and infer data lineage at scale by analyzing actual data values for relational, cloud and hadoop data.
Use machine learning to automatically suggest tags and match data fingerprints to business glossary terms
Analysts and data stewards accept or reject suggested tags, while machine learning fine tunes the tagging process and improves the matching algorithm
Map your compliance policies to your data assets: acceptable use, legal holds, expiry.
Generate ongoing mandated compliance reporting
Automate data access control via tag based security
Search for data through the Waterline GUI or through integration via 3rd party applications
Rate & Collaborate
Create subjective crowdsourced ratings and reviews which, in combination with objective profiling metadata, provide users with a view into data quality and usefulness.
Get value in days, not weeks or months
You have thousands of datasets with millions of distinct data fields across your company and that number is growing every day. Manually documenting your data isn’t an option! Waterline Data automatically catalogs all your data assets ( hadoop, cloud, relational, etc) so you can spend more time using data and less time looking for it.
Reduces manual tagging of data by over 80%. Waterline Data Fingerprinting™ combines big data analysis, machine learning and human curation to automatically catalog data and data lineage at scale
Native Big Data, Native Cloud Storage
Works natively on Hadoop and Spark to easily scale to handle all your data. Directly connects to cloud storage like Amazon S3, Azure Blobstore and Google Cloud Storage.
Data stewards accept/reject automatically suggested tags and the system learns, fine tunes and improves the matching algorithm
Any Data Source
Works seamlessly across a wide variety of data sources (relational, files, Hadoop, cloud, etc.) because you never know where the most important data is located
Self-service Data Catalog
You’re a business professional and when you have questions, you need reliable answers, but where is the right data, and who do you ask? Waterline consolidates your tribal data knowledge and makes it easy to share with others so you and your colleagues can quickly find the data you need.
Easy to use web search interface designed specifically for the business user to search a catalog of trusted, curated data
Easy to integrate
Search directly from your existing data wrangling and visualization tools integrated through our REST APIs
Ratings and Reviews
Add annotations and view the comments of other users to capture tribal knowledge and establish trusted data sources
Automatically propagates data tags so users can easily find similar data
Govern your data with agility
Data Governance isn’t one size fits all. We provide the appropriate level of governance for whatever type of data is being managed.
Simplifies data governance by delivering a truly scalable, automated and repeatable process for identifying sensitive data, capturing data lineage, and ensuring proper data use and access
GDPR governance modules generate specific reports that highlight the location and proper use of GDPR compliant data to demonstrate proper compliance process to regulators
Data access control
User and Role management ensures proper data access for sensitive data and integrates directly with Apache Ranger and Cloudera Sentry to enable tag based access control
Usage Audit and Monitoring
Auditing provides full traceability for how all users have tagged, curated, commented and searched for data within the data catalog