Category Archives: Uncategorized

Hortonworks Keen on Cascading-Tez Combo

Apr 21, 2014
Alex Woodie

In the future, it will be easier to build big data applications, and they’ll run faster and utilize more real-time data than today’s apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they’ll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark.

Hortonworks and Concurrent have been partners for some time. As one of the central Hadoop players, Hortonworks is well aware of Concurrent and its open-source Cascading development framework, which abstracts away the difficult part of writing MapReduce applications with an easy-to-use Java API and library.

Cascading is one of the success stories of first-gen Hadoop apps. Concurrent boasts more than 6,000 commercial deployments of its Cascading framework, and says customers like Nokia, Kohl’s, and Twitter are using it to simplify development of MapReduce apps on Hadoop. The product is being downloaded about 130,000 times per month, putting it on the cusp of big data rock star status.

With the upcoming launch of Cascading 3.0 in June, Concurrent will add support for Tez and Apache Spark, giving customers powerful new options for developing Hadoop applications beyond the MapReduce paradigm. Hortonworks, which already supports Tez with its Hadoop distribution HDP 2.1 and is currently offering a tech preview of Spark, likes where Cascading is headed and wanted to get ahead of customer demands, according to John Kreisa, vice president of corporate strategy for Hortonworks.

“Given the clear adoption patterns we’re seeing with Hadoop around building various data centric apps and the desire to put those apps into production, it made sense to deepen the relationship [with Concurrent],” Kreisa tells Datanami. “We know our customers want to develop apps. We know Cascading is popular–we see it in our user base. So it just made sense for us to take this next step and include it directly in the platform to accelerate the adoption of Hadoop.”

Previously, the two companies worked together to certify the integration and testing of HDP and Cascading, but it was up to customers to obtain the Cascading code and ensure that it worked. Under the expanded pact, Hortonworks will distribute the Cascading software development kit (SDK) as part of HDP and provide level one and level two technical support for customers; Concurrent will provide level three support.

In early June, Hortonworks will include support for the forthcoming release of Cascading 3.0 as a tech preview in the HDP sandbox environment. It will become generally available (GA) in late summer or early fall, says Tim Hall, vice president of product management for Hortonworks.

Hortonworks is a big believer in how Concurrent is building support for Tez into Cascading 3.0. “Tez is a significant leap forward,” Hall says. “It’s one of the critical things Hortonworks has been investing in from the open community for Hadoop, which is moving this from a batch-centric, mostly serialized approach to accessing data on Hadoop–that was MapReduce 1–and shifting this to a mixed workload environment that runs on YARN.”

The recently launch of HDP 2.1, which enabled Hive to either use the legacy MapReduce execution engine or the new Tez engine, is Hortonworks’ contribution. “Concurrent is going to follow that lead and go down that path as well with the Cascading SDK,” Hall said. “We will likely invest in working with the open source community to move some of these other tools from legacy MapReduce 2 to the next generation, which is Tez.”

Cascading 3.0 will also support Apache Spark, the in-memory framework that’s gaining a ton of momentum as yet another replacement for MapReduce. Hortonworks, whose developers largely spearheaded the development of Tez, is taking a bit of a wait-and-see approach regarding Cascading and its Spark prospects.

“One of the interesting things about the Cascading SDK is it does provide some additional libraries on top of the Java API. One of the ones we’re most interested in is Scalding libraries, which provides us a Scala interface,” Hall says. “Obviously having that access point, and seeing what the interest is in the community of Scala and the relationship of that Scalding SDK and how it does or does not work with Spark, will be something we’ll be looking at very closely with our customers.”

Cloudera Launches Data Scientist Certification

Mar 28, 2014
John Rath

Concurrent’s application platform for big data is certified with the Intel Data Platform, Datameer 4.0 introduces “flip side” view for big data analytics workflow, and Cloudera addresses the industry need for big data skills with a certification program and big data challenge for data scientists.

Cloudera launches certified data scientist program. Cloudera announced the industry’s first hands-on data science certification, called Cloudera Certified Professional: Data Scientist (CCP:DS). Consisting of an essentials exam and data science challenge, the new program helps developers, analysts, statisticians, and engineers get experience with relevant big data tools and techniques and validate their abilities while helping prospective employers identify elite, highly skilled data scientists. The demand for data scientists is at an all-time high, and they possess a rare combination of engineering capabilities, statistical skills, and subject matter expertise that is difficult to find. Once candidates have passed the Data Science Essentials exam, they must then successfully complete a Cloudera Data Science Challenge, offered twice annually. Cloudera’s second Data Science Challenge opens on March 31 and is about detecting anomalies in Medicare claims.

Concurrent Cascading certified for Intel Big Data platform. Enterprise big data application platform company Concurrent announced the compatibility of its Cascading application platform for big data applications using Hadoop with the Intel Data Platform. As Hadoop applications leverage an enterprise’s most valuable asset – its data, a secure infrastructure is crucial. Enter the Intel Data Platform, which incorporates and builds on an open source software platform to provide distributed processing and data management for enterprise applications analyzing massive amounts of diverse data. This certification offers the right fit for businesses seeking to make the most of their Big Data by combining the productivity benefits of Cascading with the security, performance and manageability advantages of Intel Data Platform. “Today’s announcement demonstrates Concurrent’s continued mission of making enterprise data application development easy for the masses and enabling businesses to operationalize their data,” said Gary Nakamura, CEO at Concurrent. “Now our users can rely on the Intel Data Platform’s next-generation analytics and secure infrastructure combined with all the power and benefits of Cascading to drive business differentiation.”

Datameer introduces ‘Flip Side’ view for big data analytics. Big data analytics company Datameer introduced Datameer 4.0 and the first end-to-end big data analytics workflow that allows for visual insights at every step of analysis. With Datameer’s new “flip side” view to its spreadsheet interface, users can get instant visual feedback at any point in the big data analytics process. This allows them to check initial results and adjust appropriately as they go, further shortening the time to insight and freeing up time for teams to tackle bigger challenges. The new product consists of a visual profile of the data after every step of the process, including integration, enrichment, transformation and advanced analytics. “We are consistently pushing the envelope to make big data analysts as productive as possible, and studies show the fastest way to digest information is visually,” said Stefan Groschupf, CEO of Datameer. “With Datameer 4.0, analysts no longer need to wait until the final visualization to gain insights from their data. This new paradigm means companies will realize meaningful ROI on their big data analytics projects faster than ever before.”

Concurrent rounds out data-driven app dev framework with performance management tool

Concurrent rounds out data-driven app dev framework with performance management tool
Maria Deutscher, SiliconANGLE
February 7, 2014

The rapid growth in unstructured information is transforming the entire enterprise IT stack, from the underlying infrastructure to the business software end users depend on to be productive. But while the journey to re-architect the data center is well under way, with the Hadoop distribution race already in full swing, the industry is only now beginning to deliver on the potential of Big Data applications, which Tresata CEO Abhi Mehta considers the next frontier of analytics.

“The value in the ecosystem sits really high up the stack, in what we call advanced analytics applications,” Mehta remarked in an interview on theCUBE last year. He predicts that within four years, the “massive race to zero in large parts of the historical data analytics stack where billions of dollars are currently being made” will shift spending to “the highest point of the stack in what we called analytics applications, predictive analytics software that works in Hadoop.” And Tresata isn’t the only vendor eyeing a slice of this rapidly emerging market.

A five-year-old startup called Concurrent is making big strides with Cascading, an open source Java framework that simplifies the development of data-driven apps on the batch processing platform. The firm claims that the software is downloaded more than 130,000 times a month and used by thousands of organizations, including Twitter, eBay and The Climate Corp.

Concurrent this week announced a complementary application performance management solution that aims to accelerate time to market and streamline maintenance for Cascading users. Branded as Driven, the cloud-based service provides visibility into data flows and program logic at runtime to enable test-driven development while allowing practitioners to keep tabs on information quality.

“The release of Driven further enables enterprise users to develop data oriented applications on Apache Hadoop in a more collaborative, streamlined fashion. Driven is the key to unlock enterprises’ ability to drive differentiation through data. There’s a lot more to come – this is only the beginning,” commented Chris Wensel, the founding CTO of Concurrent.