Data Governance Todd Goldman February 2, 2017

The Real Cost and Risk of Redundant Data

How big of a problem is data redundancy? If you are like most companies, it is much bigger than any one thing.

You see, data to business has become like fly to flypaper. Once the cost of storage began its steep descent in 2005, cloud providers were able to come along and provide lots of space at low cost.

Last year, Veritas surveyed 10,000 organizations. 83% of IT decision makers said their organizations were “data hoarders.” This is neither a surprise nor a bad thing. Gartner says 90% of that data is undiscoverable and useless. (Something Waterline Data helps correct.) Much of that dark data could also pose security, compliance and other corporate risks that don’t even come close to matching the data’s potential value. (Waterline Data resolves that, too.)

These are issues that I’m happy to say people are talking about. And I’m even happier to see many organizations starting to do something about them. But there’s something else that not many people are really talking or doing anything about at all: a lot of that data is also redundant, and it’s costing organizations a lot more than they think. Sure, storage costs may be shrinking, but you still pay for it. Data hoarders pay even more. A lot more. The volume of stored data is growing all the time. And even if your volume of data grows at a steady rate, the associated license and storage costs can grow exponentially.

But it depends on what kind of storage you’re talking about. If organizations don’t fully understand the value of their data and where the redundancies are—and I would say very, very few do—how are they supposed to know 1.) which data is “hot” and needs to be housed in expensive, high performance data warehousing tools, 2.) which data is “warm” and can be offloaded onto cheaper, lower performance storage, and 3.) which data is “cold” and can be eliminated altogether? Sure, these companies may be comfortable with the money they’re currently spending to hoard as much potentially valuable data as possible. And they’re right in a way. Much of the data that isn’t being used now could prove to be extremely valuable farther on down the road. It’s an investment and a wise one to make. But why should the investment of hot, warm and cold data all cost the same? And why pay for redundant back-up for data that will never be valuable?

Think about the money you spend storing your data. Now think about how much of that data isn’t even touched. For some organizations, that can easily be up to 50% of their data. Now think about redundant data: 1.) How much of it you have (typically in the 10-30% range, which doesn’t even include proliferation of data sets which are 90% redundant) , 2.) how much that can cost in terms of dollars , and 3.) how much it can cost in terms of risk. Yes, risk—your potentially biggest expense of all. Data should be treated as an asset, which means it should also be managed as a potential liability.  The result is that in some industries, keeping data around longer than necessary exposes your organization to potential lawsuits that could easily be avoided by properly managing the lifecycle of your data.

The bottom line is that data redundancy is a much larger hidden cost than most organizations realize. The flip side however is that this is a cost that organizations are already bearing. This means it is also a source to mind for savings and budget.  By eliminating the excess storage, database and associated costs of data redundancy, there is a significant vein of gold to mine in cost savings that can be used to fund other data related projects.

In effect, data redundancy presents something other than just cost and risk. It also presents opportunity.