Features & Capabilities

Processing APIs

Define complex data flows and create sophisticated data oriented frameworks. These frameworks can be Maven compatible libraries or Domain Specific Languages (DSLs) for scripting.

Integration APIs

Create and test rich functionality before tackling complex integration problems. Integration points can be developed and tested before plugging them into a production data flow.

Scheduler APIs

Use in conjunction with Riffle lifecycle annotations to schedule unit of work operations from any third-party application.

Physical Planner

Develop applications based on data workflows and assemblies, and let Cascading automatically create MapReduce jobs that get deployed for execution.

Flexible Source & Sink Taps

Easily read and write data from any source and in any format, and change location of files based on deployment environment.

Built-in Testability

Validate and debug your pipe-assemblies, custom operations, and data flows before deploying the application into production.

Standard Relational Operations

Apply relational data manipulation operations like Select and Join to unstructured data using Pipe operations like Each, GroupBy, CoGroup, Every, etc..

Include XML Operations

Use XML operations to tidy up HTML and XML data streams when analyzing them as part of a workflow.

Scriptable Interface

Call Cascading APIs from any Java compatible scripting language to instantiate Cascading classes to create pipes, assemblies, and flows.

Concurrent Tweets Concurrent Tweets