Earlier this week, the US Department of Labor issued the results of its monthly Job Openings and Labor Turnover Survey. It showed a growing gap between job openings and hirings, suggesting the skills mismatch that has been gnawing away at the growth potential for many businesses over the years isn’t going away anytime soon. Meanwhile, specifically within the Big Data sector, new research suggests that 40% of companies are unable to find the Big Data experts they need to put all their hoarded data to work.
One of our customers, a global electronics manufacturer, was no different. They had a massive data lake—a dumping ground to be exact—that everything from sales and marketing to operations and product lines were pouring new data into every day. Access to this data was controlled by four data stewards.
That’s right. Four data stewards serving as the bridge between 3,000 data and business analysts and many petabytes of data. And even while the data knowledge these stewards had in terms of the “what” and “whereabouts” combined for 80% total familiarity with the data lake, this knowledge was siloed. The stewards knew their data well, but they couldn’t possibly keep up with all the analyst requests. Also, while each steward knew his or her corner of the data lake (if data lakes have corners), they couldn’t know about all the other data beyond their purview that could’ve been cross-correlated for new, added value.
Plus, having only four data stewards created severe bottlenecking that slowed the free flow of data through the enterprise where it was needed and when it was needed. This bottleneck may have—on one hand—served a critical function in ensuring proper data governance. The data stewards were able to locate the data the analysts wanted and create partitions within the lake to allow the analysis to take place, which instilled control over sensitive data that might have required masking. On the other hand, it wasn’t true self service analytics. I’d argue while they were offering a kind of self service for their data lake, when it came to actually locating the data or gaining access to the data, it was back to the decades old IT request line.
In short, one solution might have been to hire more data stewards and data scientists. Potential employees that, as I mentioned, just aren’t available in the world and are quite expensive to hire anyway. Besides, our customer’s data management needs would only grow exponentially over time, and all of these growing data management needs would have to conform with new emerging regulations like GDPR. (Trust me, more are on the way.) I mean, let’s face it. You can’t count on being able to simply hire more data stewards over the long term. It’s just not possible. With the amount of data most organizations are pumping into the lake, it would take years to get these people up to speed.
It soon became abundantly clear to our soon-to-be customer that they needed to automate some of the data steward’s duties. Their vision? To allow users to find the data they wanted and gain access through the automatic provisioning of the data. To give users access to easily identifiable data that’s been intelligently catalogued by a combination of sophisticated algorithms, tribal knowledge and machine learning. This removes the bottleneck. This pushes data into governance faster so regulatory compliant information can move swiftly through the organization. This is data profoundly impacting the organization. And this is what we helped them do.
We like to say we helped this customer accelerate the value of their data. Click here to see how Waterline Data can accelerate the value of your data, too.