Big Data Todd Goldman February 28, 2017

The Top 5 reasons why Big Data in the cloud will outstrip growth of Big Data on premise.

When it comes to integrating Big Data into your business and deriving value from data, it’s all about ease of deployment. At the end of the day, and especially for those of you who have been keeping up with my blog posts so far, Hadoop is just too damn hard. Anything that makes it easier to deploy a big data solution will win out. Empirical evidence supports this position as the number of requests we have been receiving at Waterline for cloud-based deployments vs. on-premises deployments has jumped significantly in the last two quarters. At least from the perspective of this vendor, whose customers are working on managing large quantities of data, there is a clear, measurable increase in demand for cloud deployments of big data projects.


Based on customer feedback (and in no particular order), here are the top 5 factors driving the move from on-premise solutions to cloud-based:


  1. The physical implementation of a cluster is too much effort. Why buy a cluster of servers when you can just go to AWS or Azure and spin up a bunch of servers? You don’t have to order the hardware and you don’t have to power them or even cable them up. Most of the time just getting the physical environment running alone is hard enough, and that doesn’t even include getting the actual software up and running
  2. Lack of people with skills and experience. This is a problem plaguing big data. The value of cloud deployment is that the cloud vendors are continually chipping away at the ease of use problem of big data by providing more automation. With the ability to automatically spin up and down clusters, these suppliers are significantly reducing the need for people who have deep expertise in running a massive computing cluster. This is important because people with this kind of deep expertise have become harder to find.
  3. Lower perceived risk and time to action. One huge advantage of the cloud, especially for big data implementations, is that these projects are risky. You don’t know up front if you are going to actually find something in your data. But with cloud vendors, you can spin up a cluster, do some work, and then spin it back down if you don’t find something in your data, without incurring much overall project risk. Better yet, if you do find something in your data, you can then quickly spin up more systems to scale your project without having to worry about losing time buying systems, buying software and implementing everything etc. Note that the idea of scaling up and down does not work for all use cases. Sometimes, you have to ramp up systems in the cloud and keep them running due to the nature of the project or the data.  Even if that is the case, it is still a lot easier to get that done in the cloud these days, which of course contributes to risk reduction.
  4. Incremental cost vs. big up-front investment. Directly related to the risk point above is the associated cost. Cloud deployments of big data let consumers pay only for the services they actually use. The good news of course is that if you are running an experimental project, the cost, if you fail fast, is much less than if you had to buy all of the equipment and pay for it, only to have your project get shut down.
  5. Sharing large data sets. One issue that we see often at Waterline is the desire for businesses to share data sets either with their customers or even internally within an organization. However, moving those data sets around is a challenge and even sharing them within an organization can be problematic because adding new users introduces load on a system. For example, if business unit A wants access to business unit B’s data, their might not be enough compute power to support more users. However, if the data is sitting in the cloud, it is much easier to add capacity without having to duplicate the data. Even if they do have to duplicate the data, that process can be quickly managed within the cloud which allows duplication to happen very quickly and easily.


While I can’t explain this phenomenon completely, I can say that clearly, there is something in the water that is making it cloudy.


P.S.  I apologize in advance for my crappy pun.