Products

Cascading

Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing work-flows on an Apache Hadoop cluster.

The processing API lets the developer quickly assemble complex distributed processes without having to “think” in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data. Complex jobs tend to start simple, so Cascading supports simple data processing applications as well, and can effectively absorb additional complexity as applications are developed and matured.

Because Cascading is an alternative API to MapReduce, and not an independent System or Architecture, it easily fits in to organizations development process allowing developers to quickly create unit and integration tests of all aspects of their application.

Cascading allows developers to use an easy-to-visualize, source-pipe-sink paradigm through a Java API instead of writing multiple MapReduce jobs.  With Cascading, developers can create new operations or reuse past operations, chain these operations into data processing workflows and save results to an output data set.

assembly

Cascading allows developers to chain series of pipes and filters together that read data from data sources and feed it to data sinks.

flow

The Cascading planner converts this assembly of pipes into a cluster-executable processing flow consisting of managed MapReduce jobs.

cluster

Finally, this flow is executed on a remote Hadoop cluster in the most efficient manner possible.

Cascading works with both “on-premise” and “cloud” deployed clusters. And is used in production by many companies in Amazon EC2 and Elastic MapReduce. Both the MultiTool and CloudFront LogAnalyzer were developed with Cascading.

Cascading is Open Source and dual licensed under the GPL and OEM/Commercial Licenses. We offer both OEM/Commercial Licenses and Developer Support options for Cascading.

Read more about Cascadings features or thumb through the Cascading 1.0 User Guide. Visit the community site for more.

Resources

News & Events

  • Oct. 09, 2009 — Cascading now has support for HBase and the JDBC API.
    Learn more ›
  • Jun. 17, 2009 — OSBRIDGE: Building Scale Free Applications with Hadoop and Cascading
    Learn more ›
  • Jun. 06, 2009 — Cloud Computing - From Getting Started and Tools, to Large Scale Architecture Design
    Learn more ›
  • May. 27, 2009 — SAM SIG: Hadoop architecture, MapReduce patterns, and best practices w/Cascading
    Learn more ›
Cascading

Cascading is software for fault tolerant data processing. Learn more ›

support

Concurrent provides licensing, indemnification, and support for Cascading. Learn more ›

training

For advanced Hadoop and Cascading training, visit our partner, Scale Unlimited ›