The new features in Cascading 2.1 (and Cascading 2.0) make it even easier to use and more powerful for developing Big Data applications.
New Processing Primitives
Merge and HashJoin pipes optimized for faster processing of data streams. HashJoins are commonly referred to as map-side joins.
Use the new local mode processing planner to run Cascading completely in memory on local computer. Useful for development, testing, and interactive data exploration against local data sets.
Simplified Integration APIs
Easier to write custom Cascading integrations to pull data from and push data into SQL and NoSQL data stores.
Simpler Unit Testing
Test static helpers to support alternative testing frameworks. And allows for developers to write tests that are independent of the underlying platform, local or Hadoop.
For HashJoin to enable more sophisticated algorithms like Bloom Filters. Or implement custom spill strategies within a HashJoin or CoGroup in order to scale beyond available memory.
Create custom types, serialization, comparators, and hashing to improve efficiency and integration with external systems.
Restart flows within an application by defining checkpoints on intermediate data and resume without regenerating unnecessary data from the beginning.