News & Events

BigDataCamp 2010 - Videos Now Online

BigDataCamp 2010 was a huge success this year with well over 250 registrants in attendance, making this un-conference nearly the same size as the first Hadoop Summit in 2008.

All of the BigDataCamp workshop videos are now online. Specifically our founder, Chris K Wensel, presented on Cascading. Check it out if you missed it live.


BigDataCamp 2010

Concurrent, Inc. will be one of the sponsors for this year’s BigDataCamp the night before the Hadoop Summit.

BigDataCamp, is an unconference for data engineers, enterprise architects, developers, analysts, data mining and business intelligence professionals working with or interested in learning more about Hadoop.  Amazon Web Services is also sponsoring the event and providing free Amazon Web Services credits to be used during the workshop part of the event.  BigDataCamp is designed for users of Hadoop, MapReduce, and related technologies to exchange ideas in a loosely defined format and will take place on June 28, 2010 in Santa Clara, Ca., the evening prior to the annual Hadoop Summit.

BigDataCamp will be led by Dave Nielsen, co-organizer of the popular CloudCamp series of unconferences. Pre-defined topics will include best practices in application development and advanced analytics and will be presented in the form of a workshop with free Amazon Web Services credits for use with Amazon Elastic MapReduce. Other topics will be determined by conference attendees through majority-vote rule.

BigDataCamp is free but limited to 150 attendees. BigDataCamp attendees will receive a 30% discount on registration for the Hadoop Summit.  More information and registration details can be found at http://www.bigdatacamp.org.


Cascading 1.1.0 Now Available

We are happy to announce that Cascading 1.1.0 is now publicly available for download.

This release features many performance and usability enhancements while remaining backwards compatible with 1.0.

Specifically:

  • Performance optimizations with all join types
  • Numerous job planner optimizations
  • Dynamic optimizations when running in Amazon Elastic MapReduce and S3
  • API usability improvements
  • Support for TSV, CSV, and custom delimited text files
  • Support for manipulating and serializing non-Comparable custom Java types
  • Debug levels supported by the job planner

For a detailed list of changes see: CHANGES.txt

Along with this release are a number of extensions created by the Cascading user community.

Among these extension are:

  • Bixo - a data mining toolkit
  • DBMigrate - a tool for migrating data to/from RDBMSs into Hadoop
  • Apache HBase, Amazon SimpleDB, and JDBC integration
  • JRuby and Clojure based scripting languages for Cascading
  • Cascalog - a robust interactive extensible query language

This release will run against Hadoop 0.18.3, 0.19.x, and 0.20.x. Including Amazon Elastic MapReduce.

Note the tests will not compile or run against Hadoop 0.18.3 due to package changes since that version.

 


Karmasphere Studio Ships with Cascading Support

The recently released Karmasphere Studio 1.2 now includes support for Cascading 1.0 in the free community download.

Karmasphere Studio is an IDE and Debugger for Hadoop MapReduce application developers that also includes integration with the Amazon Web Services platform.

And with Cascading support directly in the Debugger and IDE, developers can even more quickly develop and debug complex Hadoop jobs.

Also worthy of note, Karmasphere recently received $5M Series A funding.


Cascading 1.1 RC1 Now Available

Cascading 1.1 RC1 is now available for download from the Cascading community site downloads page.

See the announcement for links to the detailed changes.


Case Study: RazorFish User Segmentation with Cascading and Amazon Elastic MapReduce

Amazon recently published a case study on how RazorFish “segments users and customers based on the collection and analysis of non-personally identifiable data from browsing sessions”.

From the case study:

Mark Taylor, Program Director at Razorfish, said, “With our implementation of Amazon Elastic MapReduce and Cascading, there was no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired. We completed development and testing of our first client project in six weeks. Our process is completely automated. Total cost of the infrastructure averages around $13,000 per month. Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.”

Read more about how RazorFish uses Cascading to process big data.


Cascading now has support for HBase and the JDBC API.

Cascading now has support for HBase and the JDBC API.


OSBRIDGE: Building Scale Free Applications with Hadoop and Cascading

A rapid introduction to Hadoop architecture, MapReduce patterns, and best practices with Cascading.

Description

Many more applications are suitable to be built on Apache Hadoop than many developers realize.

In this presentation, we hope to give attendees enough information on how Hadoop works, how MapReduce can be leveraged to perform common and well understood data processing operations, and how the Cascading open-source project helps developers rapidly build sophisticated Hadoop applications that can be simply tested locally and executed remotely.

http://opensourcebridge.org/sessions/111


Cloud Computing - From Getting Started and Tools, to Large Scale Architecture Design

Do you have a technical background in other areas and you want to get started   in Cloud Computing?  Do you want to find out more about Map-Reduce and scaling   up flexibly on a larger number of CPUs?  What tools should be used by a new   person to get started quickly or for an experienced development group to   develop large scale enterprise system?  Do you want to better understand how how   one architects systems in the Cloud?

This training session goes over basic cloud concepts in general along with the   advantages of Cloud vs. traditional computing.  To make things tangible, there   is a walk through of the mechanics needed to get an existing, small sample   application up an running on a cloud provider.  Then the session will cover how   such a system could incrementally scale up to dozens or hundreds of CPUs.  Architecture principles and decisions will cover how to design and scale systems   from small to a huge scale.

Also being presented is what it takes to get an application running on a   distributed cluster infrastructure, and how it differs from traditional methods.  As an experienced MapReduce/Hadoop and other scale-free technologies, Chris will   illustrate the concept with real life use cases that he has worked with.

The end of the training seminar will allow for questions and an open discussion.  Bring your questions and challenges.

http://www.sfbayacm.org/events/2009-06-06.php


SAM SIG: Hadoop architecture, MapReduce patterns, and best practices w/Cascading

Abstract: A rapid introduction to Hadoop architecture, MapReduce patterns, and best practices with Cascading.

Hadoop is an open source implementation of the Google MapReduce processing model and has been widely embraced by startups and established companies like Yahoo! and Amazon. Cascading, also an open source project, is an alternative API to MapReduce that allows developers to rapidly create sophisticated applications on the Hadoop platform.

Unfortunately the MapReduce model can be very complex to manipulate when attempting to perform tasks developers take for granted when using relational style databases, like joins and secondary sorting of grouped values.

Further, integrating Hadoop with external systems requires a deep knowledge of its internals. But this is where Hadoop clusters offer the most value, of off-loading data cleansing and data migration tasks from traditional tools and expensive load sensitive systems.

Cascading is an API that replaces the “Map” and “Reduce” primitives and their associated Key/Value algebra with functions, filters, and aggregators, and links them all together with a familiar columns and records model. And provides key processing primitives familiar to developers.

In this presentation, we will present the Hadoop architecture, how MapReduce influences that architecture and is used for common tasks, and how Cascading helps developers rapidly build sophisticated data processing and orchestration applications that can be very simply tested and executed.

Bio: Chris K Wensel has been a Software and Systems Architect for over 15 years. He is the founder of Concurrent Inc., and the author of the Cascading data processing open-source project. He’s also a Principal at Scale Unlimited, a professional services company offering commercial training and consulting for Hadoop and related large architectures.

Over the last 7 years he has deployed large and sophisticated data processing applications for use by companies providing geo-spatial, web content, and financial data services in both the traditional enterprise data-center and on Amazon EC2.

Location

Cubberley Community Center
4000 Middlefield Road, Room H-1
Palo Alto, CA

http://www.sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&eventID=13367


Cloud Computing Paradigms: MapReduce, Hadoop, Cascading

Cloud computing promises to make a significant impact on engineering computing paradigms and application design in a number of important arenas. Amazon’s Elastic Compute Cloud and subsequent Elastic MapReduce is only one of many providers offering cloud computing.

This talk will provide an overview of common programming methods in the cloud, including MapReduce, Hadoop, and Cascading. Hadoop is an open source implementation of the Google MapReduce processing model which has been widely embraced by startups and established companies like Yahoo! and Amazon. Cascading is another open source project which provides an alternative API to MapReduce, and which allows developers to rapidly create sophisticated applications on the Hadoop platform.

http://www.californiaconsultants.org/Events.cfm/item/114

 


Cascading 1.0 has been released

Cascading 1.0 has been released. Visit the Cascading community site for more information.


Cascading

Cascading is software for fault tolerant data processing. Learn more ›

Cascading Support

Concurrent provides licensing, indemnification, and support for Cascading. Learn more ›

Consulting and Training Services

For advanced Cascading Consulting, Training, and Mentoring. Learn more ›