Strata + Hadoop World New York, October 16, 2014 — Waterline Data Science today announced the launch of Waterline Data Inventory to enable data self-service on Hadoop, allowing users to find, understand, and help govern Hadoop data.
Solving the problem of the lack of data self-service in the Hadoop data lake
Companies are deploying Hadoop “data lakes” to provide unprecedented access to data for data science and analytics to uncover new business insight. However, Hadoop’s advantages of frictionless ingest, flexible schema on read, coupled with the lack of data governance, present problems for users trying to find and understand the data. Waterline Data Inventory addresses these problems by building a complete inventory of data assets in Hadoop and by opening access to Hadoop data through data self-service. As a result, data scientists can be more productive, business analysts can easily augment reporting and BI with Hadoop data without coding, and data governance teams can start controlling Hadoop data.
“There is no point building a predictive model of the wrong column, and without a data inventory, you don’t know if you have the wrong column,” said John Mount, co-author of the book, Practical Data Science with R. A data inventory is also valuable for Hadoop data governance, according to Sunil Soares, author of Big Data Governance.
Alex Gorelik, Waterline Data Science founder and CEO, states “A major complaint with Hadoop is that once you’ve loaded the data, extracting value is like finding a needle in a stack of needles. Waterline Data Inventory lets business users find the best needle in the stack of needles, without having to write code, and without having to wrangle the entire stack. That’s our secret sauce, and key to delivering faster time to value and broad Hadoop adoption.”
Waterline Data Science is one of twelve finalists in the Startup Showcase at the Strata + Hadoop World conference this week in New York City. The Showcase is a competition of “best of the best” innovative Big Data startups, to be judged by a panel of investors.
According to Gorelik, “Given the importance of addressing the lack of data self-service in Hadoop, over a dozen companies – including Hadoop distributors, large enterprises, and Internet giants – collaborated closely with Waterline as strategic advisors throughout the product development and beta lifecycle, sharing their use cases, reviewing our designs and helping us shape the right solution.”
“We’re extremely excited to invest in Waterline Data Science after incubating the company at Menlo,” said Venky Ganesan, Managing Director at Menlo Ventures. “Waterline’s product is the linchpin in making Hadoop consumable for the enterprise, and will provide much-needed data self-service as the market expands,” he added.
About Waterline Data Science
Waterline Data Science is an early-stage Big Data software company, founded in December 2013, backed by Menlo Ventures and Sigma West. The inspiration for the name “Waterline” came from the metaphor of the Big Data Lake. Waterline solves the challenges of data self-service for the Hadoop data lake. It’s easy to get data into Hadoop, but it’s not easy to get it out in a self-service manner and derive business value from it. The idea behind Waterline is that data self-service for Hadoop should be like finding the data you need easily, without having to dive for it — you should be able to Hadoop “above the waterline.”