With the rapid growth of big data and prevalent use of data catalogs, there is a strong likelihood of a Gartner Magic Quadrant for Data Catalogs in the near future. Based on existing Gartner Magic Quadrant reports, we predict that a Gartner Magic Quadrant for Data Catalogs will include the following seven key criteria.
#1: Collaboration/Data Democracy
Fundamentally, data catalogs will be ubiquitous among all data stakeholders. Not only is the amount of data increasing at an exponential level, the areas and individuals within an organization who will be using the data are growing rapidly. A data catalog platform needs to be intuitive for data neophytes yet powerful enough for data experts. The platform also needs to enhance the data user’s ability to access the most valuable (and trusted) data from an organization’s data lake.
#2: Complying with Compliance
In my many years of working with directly with large enterprises around compliance, I have yet to hear a CDO or CIO saying that compliance wasn’t one of their top priorities. The chances of compliance regulations becoming less complex and less pervasive is practically nil. The advent and evolution of further state, country, regional and even global regulations are as inevitable as “death and taxes.” As part of overall data governance and compliance initiatives, data catalogs are necessary to discover, tag, categorize and secure all data, but especially sensitive data, which not only includes PII information, but also any information about a customer.
#3: Scaling and Automation
Although some companies still manually code their data, they’re a dying breed because as data demands grow, more human errors are introduced through manual coding processes. There’s no doubt that the days of manual coding are going to go the way of the flip phone. Instead of manual coding, data catalog vendors will have to leverage artificial intelligence and machine learning to keep up with this avalanche of data. The vendors who want to score in the upper right quadrant will need capabilities like Waterline’s Fingerprinting technology, which removes virtually all human intervention.
#4: Big Data Management Tools (Can we just say “Data Lineage” here?)
A data catalog’s ability to analyze data lineage may be consolidated with metadata management as a big data management tool, but data lineage functionality stands out because it’s critical to data professionals and business users to ascertain the data’s trust factor. If a user can’t trust their data, they can’t trust their decisions. Overall, big data management tools will also come under close scrutiny for their data democratization capabilities.
#5: Data Rationalization
Although data rationalization could be considered a big data management tool, I think a Gartner Magic Quadrant for Data Catalogs would separate data rationalization into its own category. Data rationalization identifies overlapping and replicated data. The importance of being able to identify, merge and delete duplicate, and thus extraneous, data is paramount in allowing users to find the most current, best-sourced and most accurate data for their organizational decision-making.
#6: Data Equality
In a perfect world, there would a few standardized data schemas for data ingestion. Because there are so many data formats, the quality of a data catalog may be determined by its effectiveness to efficiently ingest data. Gartner will likely examine a data catalog vendor’s ability to turn the plethora of data sources into something palatable for the average data stakeholder.
As a society, we agree that “No man is an island,” and that holds true for data catalog platforms. The upper right corner of the Magic Quadrant will include applications that integrate with other vendors’ data governance, ingestion, data lake management and other applications within a given organization. Those data catalog vendors who establish interoperability with other key big data application vendors are the ones who benefit because they’ll be part of a customer’s seamless software ecosystem rather than the customer playing the role of systems integrator for several standalone applications.
If and when Gartner’s Magic Quadrant for Data Catalogs arrives, Waterline Data is sure to be in the upper right quadrant. The Waterline data catalog is proficient in all seven of these categories, plus Waterline has the experience of implementing several successful data catalog initiatives for some of the top companies in Europe, Australia and the U.S. Whether this Magic Quadrant is created or not, there is no doubt that the data catalog’s value and pervasiveness within big data infrastructures make it worthy of Gartner’s analysis.