Cascading 3.0 Future-Proofs Data-Centric Application Development on Hadoop

Cascading 3.0 Future-Proofs Data-Centric Application Development on Hadoop

Joyce Wells, Database Trends and Applications
May 13, 2014

Concurrent, Inc., the company behind Cascading, an open source application development framework for building data applications on Hadoop, has announced Cascading 3.0, which CEO Gary Nakamura says will give enterprises the flexibility to build their data-oriented applications on Hadoop once, and then run the applications on the platform that best meets their business needs.

Cascading is very focused around reliable and reusable tools to build data products but also give users with varying skill sets the freedom to solve problems, said Nakamura.

“Cascading 3.0 will allow applications to execute on whatever fabric that we support, and that end users want to run on, through our new query planner – that means that an application that was written 2 years ago and that solves a particular business problem for an end user can very quickly be migrated over to a newer fabric like Apache Tez or Apache Spark,” Nakamura said. “Enterprise users can write an application once and deploy on whatever fabric they would like depending on what the business problem is.”

The added migration flexibility is critical for the Hadoop community, says Nakamura. For existing customers, it means ease of migration to new computation platforms with very little effort. Longer term, it is important for mainstream adoption because the rapid innovation that is happening inside of Hadoop is causing some enterprises to sit on the sidelines, concerned that it is too complex, and if they build an application now and a platform is changed, they will have to do a complete rebuild.

“What we are providing is a standard way to develop data-centric applications without the risk of having to rewrite those applications when distributions or the providers of the computation engines underneath it change direction one day.”

Cascading 3.0 will ship with support for local in-memory, Apache MapReduce, and Apache Tez. Shortly after, support will be added for Apache Spark, Apache Storm and others through the new pluggable and customizable query planner.

Third-party products, data applications, frameworks and dynamic programming languages built on Cascading will benefit from this portability, according to the company. Cascading offers compatibility with all major Hadoop vendors and service providers including Altiscale, Amazon EMR, Cloudera, Hortonworks, Intel, MapR and Qubole, as well as others.

Cascading is used by enterprise Java developers first and foremost, but Concurrent offers interfaces that allow users working with R, MicroStrategy, or SAS to take their predictive models and deploy them on Hadoop. “We also have a SQL interface so SQL end users and anyone who knows how to program with SQL can leverage Cascading,” said Nakamura.

Concurrent also recently announced strategic industry partnerships with Hortonworks and Databricks, and new product innovation with the introduction of Driven, its flagship product that provides application performance management for data-centric applications from development through production.

Cascading version 3.0 will be available in early summer and freely licensable under the Apache 2.0 License Agreement. Concurrent also offers standard and premium support subscriptions for enterprise use.