News Big Data

Waterline Data Secures IP Protection for Unique Content-Based Data Lineage Discovery and Tracking

Only Waterline’s AI-Driven Data Catalog Ensures Accurate Decision Making Based on Fully Traced Data Lineage

Mountain View, Calif.—January 30, 2019Waterline Data, a global leader in data cataloging solutions and applications, today announced it has been granted a patent for Content-Based Lineage Discovery and Tracking, another key aspect of its unique Fingerprinting™ and automated tagging technology, which provides for faster and easier discovery of the vast amounts of data stored in data warehouses, cloud services and databases across the enterprise.

Data lineage is critical to data-driven enterprises for several reasons:

  • When an analyst or data scientist finds a data set that meets their needs, a critical aspect of deciding whether they can trust this data set is understanding where the data came from
  • Regulations such as BCBS Rule 239 require enterprises to document data lineage of critical data
  • Developers need to understand how their changes impact downstream systems
  • Operations staff needs to understand implications of data corruption, data integration bugs, and human errors


Currently, lineage is captured by most data integration tools and some of the modern platforms audit tools such as Cloudera Navigator or Apache Atlas. Unfortunately, in most enterprises, a data set can go through dozens or even hundreds of hops, many of which do not capture lineage information. If any one of those hops is missing, these critical questions cannot be answers. Given the variety of tools used to move data–everything from file system tools, FTP and other file transfer protocols and downloads to custom programs and scripting languages–many times the only clue to where the data came from comes from the data content itself. Waterline’s AI-driven Data Catalog, with its now patented Content-Based Lineage Discovery and Tracking, is the only solution on the market that can help fill those missing hops to provide the only complete solution in the market that combines:

  • User-Defined Steps: Approved data curators can authenticate data manually
  • Imported Data Lineage: Waterline provides a set of REST APIs for importing lineage from any source and a growing list of adapters including Apache Atlas or Cloudera Navigator adapters
  • Inferred Data Relationships: Waterline Data Fingerprinting technology uses machine learning to infer the lineage of any given data set


Waterline Data’s Content-Based Lineage Discovery and Tracking is available through Waterline’s AI-driven Data Catalog. The driving force behind it is the company’s patented Fingerprinting technology, which works on the concept that a column of data has a distinctive signature, or a fingerprint, that incorporates its technical metadata, content, format and context. By examining this signature, an AI system like Waterline Data’s AI-driven Data Catalog can identify what that data is, determine the other columns that share similar fingerprints, and connect the data to a business term or label for easy discovery and analysis. Waterline’s Fingerprinting technique is the industry’s only data catalog technology to combine AI and machine learning with best-in-class crowdsourcing and big data scalability to deliver a modern data catalog that meets today’s enterprise needs.

The patent application number is 15/350,843 with Waterline Data founder and CTO Alex Gorelik as the inventor. Earlier this month, Waterline Data announced another patent it had secured for automated tagging. Over his career, Alex Gorelik has been granted over 20 patents in Data Management.

“As an executive at public companies and as a data practitioner, I have been involved in countless debates about whose numbers or data is right. Enterprises can now settle those debates by leveraging Waterline Data’s patented Content-Based Data Lineage Discovery and Tracking technology to understand where data came from,” said Waterline Data founder and CTO Alex Gorelik. “This is critical technology to help enterprises make good decisions based on authoritative data and to avoid hefty penalties by achieving regulatory compliance.”

Enterprise data leaders can see Waterline Data in action by registering for a personalized live demo here.


About Waterline Data:

Waterline Data automates data discovery, compliance and the ability to take action on data by using a powerful combination of artificial intelligence, machine learning, ratings and reviews, and tribal knowledge to deliver an AI-driven Data Catalog. Our customers spend less time searching for data and more time using it to derive value while complying with data governance mandates such as GDPR. The company is funded by Menlo Ventures, Jackson Square Ventures, Partech Ventures, and Infosys, and implemented in large enterprises around the globe. Founded in 2013, the company is headquartered in Mountain View, California. For more, visit us via, Twitter or LinkedIn.