Products

Cascading

Cascading is a feature rich API for defining and executing complex, scale-free, and fault tolerant data processing work-flows on an Apache Hadoop cluster.

The processing API lets the developer quickly assemble complex distributed processes without having to “think” in MapReduce. And to efficiently schedule them based on their dependencies and other available meta-data. Complex jobs tend to start simple, so Cascading supports simple data processing applications as well, and can effectively absorb additional complexity as applications are developed and matured.

Because Cascading is an alternative API to MapReduce, and not an independent System or Architecture, it easily fits in to organizations development process allowing developers to quickly create unit and integration tests of all aspects of their application.

Cascading allows developers to use an easy-to-visualize, source-pipe-sink paradigm through a Java API instead of writing multiple MapReduce jobs.  With Cascading, developers can create new operations or reuse past operations, chain these operations into data processing workflows and save results to an output data set.

assembly

Cascading allows developers to chain series of pipes and filters together that read data from data sources and feed it to data sinks.

flow

The Cascading planner converts this assembly of pipes into a cluster-executable processing flow consisting of managed MapReduce jobs.

cluster

Finally, this flow is executed on a remote Hadoop cluster in the most efficient manner possible.

Cascading works with both “on-premise” and “cloud” deployed clusters. And is used in production by many companies in Amazon EC2 and Elastic MapReduce. Both the MultiTool and CloudFront LogAnalyzer were developed with Cascading.

Distributions certified with Cascading

Licensing

Cascading is Open Source and by default licensed under the GPL. Additional alternative licenses are available if the GPL is unsuitable for your use. Read More.

More on Cascading

Read more about Cascadings features or thumb through the Cascading User Guide. Visit the community site for more.

 

Resources

News & Events

Cascading

Cascading is software for fault tolerant data processing. Learn more ›

Cascading Support

Concurrent provides licensing, indemnification, and support for Cascading. Learn more ›

Consulting and Training Services

For advanced Cascading Consulting, Training, and Mentoring. Learn more ›