News Resource

Waterline Data Science Joins Hortonworks Technology Partner Program

Strata + Hadoop World New York, October 16, 2014 —- Waterline Data Science today announced at Strata + Hadoop World New York that it has joined the Hortonworks® Technology Partner program. Hortonworks is the leading contributor to and provider of Apache™ Hadoop®. Waterline Data Science will integrate Hortonworks Data Platform (HDP) with Waterline Data Inventory to enable data self-service on Hadoop, allowing users to find, understand, and help govern Hadoop data.

By joining the Hortonworks Technology Partner program, Waterline Data Science will work to enable and accelerate the deployment of a modern data architecture, integrating with the Hortonworks Data Platform—the industry’s only 100-percent open source Hadoop distribution, explicitly architected, built, and tested for enterprise-grade deployments.

Solving the problem of the lack of data self-service in the Hadoop data lake

Companies are deploying Hadoop “data lakes” to provide unprecedented access to data for data science and analytics to uncover new business insight.  But Hadoop’s advantages of frictionless ingest, flexible schema on read, and lack of data governance, present problems for users trying to find and understand the data. Waterline Data Inventory addresses these problems by building a complete inventory of data assets in Hadoop and by opening access to Hadoop data through data self-service. As a result, data scientists can be more productive, business analysts can easily augment reporting and BI with Hadoop data without coding, and data governance teams can start controlling Hadoop data.

“There is no point building a predictive model of the wrong column, and without a data inventory, you don’t know if you have the wrong column,” said John Mount, co-author of the book, Practical Data Science with R. A data inventory is also valuable for Hadoop data governance, according to Sunil Soares, author of Big Data Governance.

Alex Gorelik, Founder and CEO, states “a major complaint with Hadoop is once you’ve loaded the data, extracting value is like finding a needle in a stack of needles. Waterline Data Inventory lets business users find the best needles in the stack of needles, without having to write code, and without having to wrangle the entire stack. That’s our secret sauce, and key to deliver faster time to value and broad Hadoop adoption.”

Hortonworks Data Platform was built by the core architects, builders and operators of Apache Hadoop and includes all of the necessary components to manage a cluster at scale and uncover business insights from existing and new big data sources. With a YARN-based architecture, HDP enables multiple workloads, applications and processing engines across single clusters with optimal efficiency. A reliable, secure and multi-use enterprise data platform, HDP is an important component of the modern data architecture, helping organizations mine, process and analyze large batches of unstructured data sets to make more informed business decisions.

“Hortonworks is dedicated to expanding and empowering the Apache Hadoop ecosystem, accelerating innovation and adoption of 100-percent open source enterprise Hadoop,” said John Kreisa, vice president of strategic marketing at Hortonworks. “We welcome Waterline Data Science to the Hortonworks Technology Partner Program and look forward to working with them to help strengthen Hadoop’s role as the foundation of the next-generation data architecture.”

About Waterline Data Science

Waterline Data Science is an early-stage Big Data software company, founded in December 2013, backed by Menlo Ventures and Sigma West. The inspiration for the name “Waterline” came from the metaphor of the Big Data Lake. Waterline solves the challenges of data self-service for the Hadoop data lake. It’s easy to get data into Hadoop, but it’s not easy to get it out in a self-service manner and derive business value from it. The idea behind Waterline is that data self-service for Hadoop should be like finding the data you need easily, without having to dive for it — you should be able to Hadoop “above the waterline.”