An Inside View of Mainstream Enterprise Hadoop Adoption

An Inside View of Mainstream Enterprise Hadoop Adoption

Nicole Hemsoth
June 1st 2015

Few organizations have holistic insight into how the overall Hadoop ecosystem is trending. From the analysts to the Hadoop vendors themselves, there is always some other piece of the market or data that tends to be somewhat incomplete due to a single-vendor view of adoption or market data that might be outdated to allow time for analysis.

But for a company like Concurrent, the vendor behind the application development platform, Cascading, which lets users build data oriented applications on top of Hadoop in a more streamlined way, there are no blind spots when it comes to seeing the writing on the Hadoop wall. And Concurrent has had plenty of time to watch the Hadoop story unfold in full. Beginning in 2008 and through some big name use cases at web scale companies, the small company (now 25 folks, 60% of whom are engineers) saw the first flush of the Hadoop boom and rode the tide, with a combined total of close to $15 million in investments.

Beyond what Concurrent does for end users of Hadoop who want to build and deploy applications on the framework as well as monitor and track their performance via the new Driven tooling they rolled out this month, part of what makes the company interesting from the high level is that they have some unique insight into how companies are using Hadoop at scale.

As Gary Nakamura, CEO of Concurrent, tells The Platform, while they do watch what the analysts groups say about Hadoop’s rise through the mainstream enterprise ranks, their insight into what’s actually happening in the market through the use of their Cascading (and now Driven) product lines extends across all three of the major Hadoop distribution vendors as well as the open source version. While they do not have specific numbers to share, aside from an approximate 290,000 downloads of Cascading per month, Nakamura says there are distinct adoption trends they have been tracking showing healthy (although not meteoric) growth of Hadoop adoption for production workloads in mainstream enterprise settings.

By mainstream, Nakamura means large to mid-sized companies in telco, finance, healthcare, and retail. These users take a “very pragmatic approach that tends to follow milestones with 0-24 months being the experimental phase” before shifting out to build larger clusters and add more applications to the ranks. “We have many mainstream enterprise users who are somewhere in that 24 months to seven years category with an average node count for Hadoop workloads being somewhere between 100 and 200, although we also have users in the mainstream [not the LinkedIn, Twitter, and similar companies] with around 1500 nodes.”

“Beyond the startups and bleeding edge, for the mainstream world, this pragmatism extends to wanting to show a particular ROI for specific projects, then they move out to look at how they can move other applications. For mainstream users though, this is a measured approach in part because this is not a trivial expense. And even though the chasm will close slower than people think with Hadoop adoption, it will happen. If you look at MapR, Cloudera, and Hortonworks, they are getting a lot of new logos each quarter, the growth is there, but mainstream companies are very measured in how they are looking at Hadoop, especially at that 0-24 month stage, where a lot of them are,” Nakamura explains.

The leading edge companies tend to jump immediately into the deep end, but for the high-value companies that are in the infant stages (where the majority of mainstream companies are now, according to Nakamura), it’s about proving an ROI on a Hadoop investment. The Driven product that announced this month allows for complete monitoring and visualization on the health and performance of individual jobs as well as cluster-level metrics, in part to give these mainstream enterprises something that aid in their ability to show the value of the jobs they’ve pushed to Hadoop, oftentimes legacy applications that are moved in pieces—one by one to start, before more applications are moved with the help of Cascading for building out the broader Hadoop strategy.

“These mainstream users tend to come us very well informed about Hadoop. They know what they are doing. Their questions by the time to get us are more clarifying—how many deployments are there, are there similar cases of migrating business-level use cases to Hadoop that we’ve worked with before. They do not want to be the guinea pigs, in other words.” Nakamura notes that the driver for Hadoop adoption among these users is not coming from the top down (there are no directives from CIOs demanding a Hadoop strategy) but users are seeing clear opportunities for their workloads to run on Hadoop and want to be able to use Cascading to build and deploy new applications, then be able to confirm progress using the new Driven tool, especially since accountability with so many different stakeholders is critical.

Once mainstream larger enterprise users get beyond the initial growing pains (past 24 months) Nakamura says they start to see how they can take advantage of the reusable components and connectors of Cascading, which lets them roll developments made on one application into another. “We see this is as the path to growing Hadoop at these enterprises, they can create new applications much quicker to do the second, third, then before they know it, forty applications. We have users now that in three years have started this way and now have 800 applications in production.”

Nakamura does not disagree that there has been a leveling of the steep adoption curve we saw over the last couple of years, but says that with so many companies in that early 0-24 stage, he expects another rush around the bend.