What Does a CTO Do When a 60PB Hadoop Cluster Devours the IT Budget?

Pepperdata

Jan 29, 2021

In 2019, the CTO of a large global bank realized a problem: Their data continued to grow, costs for their Hadoop cluster rapidly escalated, and these costs started eating into their annual IT budget.

Learn why Enterprise clients use Pepperdata products and Services: https://www.pepperdata.com/

#HadoopCluster #ITBudget #Pepperdata

Moving off of Hadoop, or “lift-and-shift” was out of the question. They needed a way to cap their cost and growth without impacting their ability to remain market competitive.

So, just as a brief background, Chuck Yarbrough, Hitachi Vantara, Hitachi Ventara Infrastructure, and Data Management and Services Company. I joined Hitachi Vntara through the acquisition of Pentaho, so on the data management side. Got a long history in data and analytics, and super happy to be here talking about Hadoop and some of the challenges and some of the benefits.

And so excited about this, you know on the agenda. We show Hadoop history and I think it's interesting because I was pretty fortunate. I had joined Pentaho at a time when big data was really becoming kind of into its own, and Hadoop had been around a little bit. And people were beginning to look at it, and understand it, and want to know more about it. And it was really cool, and it was enabling us to do things we'd never been able to do before. And in my background, from an analytics perspective, I had, I had been involved in early days of reporting and analysis tools, and sort of, you know, think about business intelligence applications and data warehousing.

I managed data warehouses in Silicon Valley back in the day, we always had some challenges and that was how much data we could actually make available, you know. Business users always wanted more and inevitably we'd say, well you know, you can say things like I had multiple conversations with people around, “Hey you could have, you'd have three years' worth of data, you know, and that's it.” And they'd be like, “Well, no. I need seven.” And it's, you know, you argue about it, the reality is there are limitations. Big data and the advent of Hadoop really enabled us to go way beyond what we were limited to in prior architectures. So, it was kind of fun. It was a great time to begin to look at things.

And this notion of bringing the data and the computer, you know, the storage and compute together was what was really, you know, it enabled a mass change in the industry so that we could scale to areas that we hadn't been able to.

But, that leaves us here where we're talking about a really big bank, you know, really good people, lots of data, like 60 petabytes of data. And I think as the title says, literally eating the budget right. The cost began to get pretty big. So, with that, you know, um I'll you know, I think it's an opportunity to talk about how we can help people get control of these escalating costs.

That's a great one. And I actually came to Hadoop from the other direction, from the computer side. I've been in pretty early, but it was, it was really more I came out of a traditional high-performance computing background, where you know, our concerns were not necessarily keeping large amounts of data online. It was how do we get the data to the compute fast enough? So, we would have these massive interconnects and InfiniBand and 10 gigabit interconnect in order to try to get data to the compute as fast as we could.

So, when I heard about this Hadoop thing, I was like wow, okay, bringing the compute to the data that's a pretty cool idea. So, that was my approach to it. So, being able to scale the compute because we were no longer limited by our storage bandwidth and this data replication idea. That, for me, it was like oh this is really neat. And one of the big disconnects for scaling coming out of my industry was more of the amount of storage and the amount of compute.

And being able to move data between the two, putting them together, and then compute watching, Hadoop scale as it is, has been very exciting. And yeah, initially, you know, the scaling around Hadoop was definitely seemed to be stronger on the compute side, because people already had pretty large volumes of data. But, you had companies like Pentaho and other ones that allowed people to actually look at these large volumes of data and extract intelligence from them...

Learn why Enterprise clients use Pepperdata products and Services - https://www.pepperdata.com/

Check out our blog: https://www.pepperdata.com/blog/

/////////////////////////////////////////////////////////////////////////////////////////

Connect with us:
Visit Pepperdata Website: https://www.pepperdata.com/
Follow Pepperdata on LinkedIn: https://www.linkedin.com/company/pepperdata
Follow Pepperdata on Twitter: https://twitter.com/pepperdata
Like Pepperdata on Facebook: https://www.facebook.com/pepperdata/