All posts by Pierre-Yves Poli

Concurrent Announces New Capabilities for its Application Development Framework for Data Applications on Hadoop

Apr 24, 2014
Richard Harris

http://appdevelopermagazine.com/1354/2014/4/14/Concurrent-Announces-New-Capabilities-for-its-Application-Development-Framework-for-Data-Applications-on-Hadoop/

Concurrent, an enterprise data application platform company, and Hortonworks, a provider of enterprise Apache Hadoop, have announced the Concurrent Cascading SDK will now be integrated and delivered with the Hortonworks Data Platform (HDP). In addition, Hortonworks will certify, support and deliver the Cascading application development framework for data applications on Hadoop.

Hortonworks will certify, support and deliver the Cascading SDK with HDP guaranteeing the ongoing compatibility of Cascading-based applications across future releases of HDP with continuous compatibility testing and direct HDP customer support for Cascading.

Cascading will now also support Apache Tez which enables Hadoop projects to meet demands for faster response times and delivering near real-time big data processing. Tez is a general data-processing fabric and MapReduce replacement that provides a framework for executing a complex topology of tasks. In addition, Tez executes on top of Apache Hadoop YARN, a sub-project of Hadoop which separates resource management and processing components. YARN fundamentally enables a broader array of interaction patterns for data stored in HDFS beyond MapReduce and makes Hadoop 2.0 a more general data processing platform.

Companies that already use Cascading, Lingual, Scalding or Cascalog, or any other dynamic programming language APIs and frameworks built on top of Cascading, can now migrate to newer versions of HDP that support Apache Tez, with zero investment required to take advantage of this improved processing environment.

Apr 23, 2014
Arnal Dayaratna, Ph.D

http://cloud-computing-today.com/2014/04/23/concurrent-and-hortonworks-partner-to-integrate-cascading-with-the-hortonworks-data-platform/

Concurrent and Hortonworks recently revealed a deepening of their strategic relationship whereby Cascading SDK will now be integrated into the Hortonworks Data Platform. Moreover, Hortonworks will certify, deliver and support Cascading, the application framework for developing Hadoop-based applications. A Java-based, open source alternative to MapReduce, Cascading provides developers with a framework for constructing complex, repeatable data processing tasks within a Hadoop cluster. Cascading features an abstraction platform which uses plumbing metaphors such as taps, pipes, data flows, cascades and sinks to allow developers to design, visualize and execute jobs and processes on Hadoop-based data without having to master the intricacies of MapReduce. Forthcoming releases of Cascading will support Apache Tez, an initiative that represents the next step after the addition of YARN to Hadoop that allows for Hadoop-based data to “meet demands for fast response times and extreme throughput at petabyte scale.” The partnership between Concurrent, the developer of Cascading, and Hortonworks, represents a huge coup for Concurrent given that the collaboration stands to rapidly accelerate Cascading’s adoption in enterprise environments. Hortonworks, meanwhile, benefits from packaging its Hadoop distribution with Cascading, one of the industry’s most well respected frameworks for Big data management and application development that boasts enterprise users such as Twitter, LinkedIn, eBay and Nokia. The obvious question now is whether Concurrent will finalize similar partnerships with other Hadoop vendors such as Cloudera and MapR or whether Concurrent’s partnership with Hortonworks enables the latter to improve its positioning in the battle for Hadoop market share, particularly in light of Cloudera’s remarkable $900 capital raise and partnership with Intel.

Hortonworks Partners with Concurrent for Hadoop Development

Apr 23, 2014
David Ramel

http://adtmag.com/Articles/2014/04/23/hortonworks-concurrent-partnership.aspx?Page=1

Hortonworks Inc. and Concurrent Inc. announced this week they are partnering to make Hadoop development easier and quicker by combining the former’s data platform with the latter’s Cascading application development framework.

As part of the expanded partnership, Hortonworks, a leading force in Hadoop development, will certify, support and include the open source Cascading SDK with the Hortonworks Data Platform (HDP). Also, future Cascading releases will support Apache Tez, an alternative to the original MapReduce programming model used with Hadoop that has often been criticized for a slow, non-interactive, batch-processing model. Tez supports more interactive queries, faster response times and extreme throughput at huge scales.

“Users of Cascading will now be able to rapidly build data-centric applications that take advantage of the Tez providing users with highly interactive operational applications on top of Hadoop,” said John Kreisa in a Hortonworks blog post announcing the partnership.

Kreisa claimed that Cascading is “the most widely used application development framework for data applications on Hadoop.” He said Hortonworks will guarantee “the ongoing compatibility of Cascading-based applications across future releases of HDP with continuous compatibility testing and direct HDP customer support for Cascading.” The companies will team up to provide different levels of support to Cascading users.

Hortonworks and Concurrent yesterday presented a webinar in which they described how to accelerate Big Data development with Cascading and HDP.

In the webinar, Concurrent executive Supreet Oberoi explained the genesis of the Cascading framework produced by his company, which was founded in 2008 by Chris Wensel, a pioneer in the Hadoop phenomenon who started the first Silicon Valley meetup for the budding technology.

Wensel thought Hadoop was powerful, Oberoi said, but he saw some challenges with the technology.

“The first one was that people who are used to developing data applications, they think in terms of business objects, and making them think in terms of maps and reduce is unintuitive,” Oberoi said. “The second point is that even though these APIs were written in Java, the level of complexity will preclude many of the existing Java community developers from using those APIs.”

The third, Oberoi continued, was the realization that different use cases require different execution fabrics, and being coupled to one prevented developers from using better or more appropriate technologies that might come along in the future. With those reasonings, Wensel developed the Cascading API.

“As a Java-based framework, Cascading fits naturally into JVM-based languages, including Scala, Clojure, JRuby, Jython and Groovy,” noted this site’s editor at large John K. Waters in a blog post last winter. “And the Cascading community has created scripting and query languages for many of these languages.”

Jules S. Damji explained in a Hortonworks blog post Monday that Wensel developed the API “with the sole purpose of enabling developers to write enterprise big data applications without the know-how of the underlying Hadoop complexity and without coding directly to the MapReduce API. Instead, he implemented high-level logical constructs, such as Taps, Pipes, Sources, Sinks, and Flows, as Java classes to design, develop, and deploy large-scale big data-driven pipelines.”

Damji further explained how to get started with Hadoop and Cascading with some simple examples and pointed developers to this tutorial for more detailed information.

Big data app development brought to you by Hortonworks, Cascading

Apr 23, 2014
Pam Baker

http://www.fiercebigdata.com/story/big-data-app-development-brought-you-hortonworks-cascading/2014-04-23?utm_source=rss&utm_medium=rss

Like in other technologies, there’s a huge need for applications and big data is no exception. Subsequently there is a need to assist developers in getting such to market and in play quickly. To that end, Hortonworks has added Concurrent’s Cascading SDK to its Hadoop distribution. Such helps developers operationalize their data. In addition, Hortonworks will certify, support and deliver Cascading–the most widely used App development framework for data applications on Hadoop.

“As more enterprises realize they are in the business of data, the need for simple, powerful tools for big data application development is a must-have to survive in today’s competitive climate,” said Gary Nakamura, CEO, Concurrent, in a statement to the press. “Our deepened relationship with Hortonworks furthers our commitment to Hadoop and drives new innovation around the development of enterprise data applications.”

Upcoming releases of Cascading will also support Apache Tez, a general data-processing fabric and MapReduce replacement that provides a powerful framework for executing a complex topology of tasks. According to the press release:

“Tez executes on top of Apache Hadoop YARN, a sub-project of Hadoop which separates resource management and processing components. YARN fundamentally enables a broader array of interaction patterns for data stored in HDFS beyond MapReduce and makes Hadoop 2.0 a more general data processing platform.

In addition, thousands of companies that already use Cascading, Lingual, Scalding or Cascalog or any other dynamic programming language APIs and frameworks built on top of Cascading, have the flexibility to seamlessly migrate to newer versions of HDP that support Apache Tez, with zero investment required to take advantage of this improved processing environment.”

It will be interesting to watch how quickly developers take to this and how many new apps show up as a result. My bet is it will be plenty.

Hortonworks, Concurrent Partner to Speed App Development on Hadoop

Apr 22, 2014
John Rath

http://www.datacenterknowledge.com/archives/2014/04/22/hortonworks-and-concurrent-partner-to-speed-data-centric-application-development-on-hadoop/

Hortonworks and Concurrent announced an expansion of a strategic partnership to simplify enterprise application development for data-centric applications. Hortonworks will certify, support and deliver Concurrent’s Cascading development framework for data applications on Hadoop, and the Cascading SDK will be integrated and delivered with the Hortonworks Data Platform (HDP).

The partnership underscores the timely importance of simplifying enterprise application development for these new data-centric applications. It benefits users by combining the robustness and simplicity of Cascading with the reliability and stability of Hortonworks Data Platform.

Upcoming releases of Cascading will also support Apache Tez. Tez is a significant development in the Hadoop ecosystem, enabling projects to meet demands for faster response times and delivering near real-time big data processing. Tez is a general data-processing fabric and MapReduce replacement that provides a powerful framework for executing a complex topology of tasks. In addition, thousands of companies that already use Cascading, Lingual, Scalding or Cascalog, or any other dynamic programming language APIs and frameworks built on top of Cascading, have the flexibility to seamlessly migrate to newer versions of HDP that support Apache Tez, with zero investment required to take advantage of this improved processing environment.

“Hadoop unleashes insight and value from enterprise data as a core component of the modern data architecture, integrating with and complementing existing systems,” said John Kreisa, vice president of strategic marketing at Hortonworks. “By expanding our alliance with Concurrent and integrating with the Cascading application platform, Hortonworks’ customers can now drive even more value from their enterprise data by enabling the rapid development of data-driven applications.”

“As more enterprises realize they are in the business of data, the need for simple, powerful tools for big data application development is a must-have to survive in today’s competitive climate,” said Gary Nakamura, CEO at Concurrent. “Our deepened relationship with Hortonworks furthers our commitment to Hadoop and drives new innovation around the development of enterprise data applications.”

Hortonworks boosts Concurrent team up for Big Data applications

Apr 22, 2014
Maria Deutscher

http://siliconangle.com/blog/2014/04/22/hortonworks-boosts-concurrent-team-up-for-big-data-applications/

The elusive promise of the Big Data app economy has inched a little closer to reality on Monday after Hortonworks expanded its partnership with Concurrent to package the startup’s Cascading development framework into its flagship Hadoop distribution.

Available for free under an Apache license, Cascading serves as an abstraction layer between the batch processing platform and the applications that use it, allowing enterprise developers to tap into their organizations’ vast troves of unstructured information without getting bogged down by the inherent complexity of MapReduce.

“Building applications on top of Hadoop was very difficult. That’s why our founder Chris Wensel created a framework so you could have a separate business logic layer from the data layer, and it’s written in Java so any Java programmer can pick it up,” Guy Nakamura, the CEO of Concurrent, told SiliconANGLE in an exclusive interview on theCUBE at O’Reilly Fluent Conference 2013.

Cascading goes above and beyond just making it easier to create data-driven applications, completely eliminating the need for users to change the way they work through support for broad range of enterprise technologies, including SQL and a number of popular data science tools. “The requirement for the enterprise is not to learn new skills for Hadoop but to leverage existing skills, existing systems and existing investments they already made in their infrastructure,” Nakamura explained. Upcoming versions of the framework will also include integration with Apache Tez, an emerging alternative to MapReduce that aims to deliver better performance and lower latency at large scale.

Tez runs on top of the YARN resource management and scheduling technology included in Apache Hadoop 2.0, which constitutes the core of the latest Hortonworks Data Platform (HDP) 2.0. Under the expanded partnership, the distributor is “guaranteeing the ongoing compatibility of Cascading-based applications across future releases” and offering customers dedicated support for the framework.

Boosting business

The partnership makes sense for both companies. Hortonworks is coming under increased pressure to deliver value higher up the stack and enabling applications on top of Hadoop is one of the best possible ways of accomplishing that. Plus, the integration allows it to catch up with rivals Cloudera and MapR, which have long provided support for Cascading in their respective distributions.

The announcement is also good news for Concurrent. The company’s flagship framework is now compatible with all three major Hadoop distributions, making it easily accessible to the overwhelming majority of users. The partnership with Hortonworks is especially significant because the two firms have very similar business models: they both make their their flagship products available at no charge and and monetize their user bases through value-added solutions. But whereas the Yahoo! spin-off focuses exclusively on professional services, Cascading sells complementary software such as its recently released Driven application performance management tool. Free while in beta, the cloud-based service provides visibility into data flows and program logic at runtime to enable test-driven development while allowing practitioners to keep tabs on information quality, according to the firm.

Hortonworks Keen on Cascading-Tez Combo

Apr 21, 2014
Alex Woodie

http://www.datanami.com/datanami/2014-04-21/hortonworks_keen_on_cascading-tez_combo.html

In the future, it will be easier to build big data applications, and they’ll run faster and utilize more real-time data than today’s apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they’ll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark.

Hortonworks and Concurrent have been partners for some time. As one of the central Hadoop players, Hortonworks is well aware of Concurrent and its open-source Cascading development framework, which abstracts away the difficult part of writing MapReduce applications with an easy-to-use Java API and library.

Cascading is one of the success stories of first-gen Hadoop apps. Concurrent boasts more than 6,000 commercial deployments of its Cascading framework, and says customers like Nokia, Kohl’s, and Twitter are using it to simplify development of MapReduce apps on Hadoop. The product is being downloaded about 130,000 times per month, putting it on the cusp of big data rock star status.

With the upcoming launch of Cascading 3.0 in June, Concurrent will add support for Tez and Apache Spark, giving customers powerful new options for developing Hadoop applications beyond the MapReduce paradigm. Hortonworks, which already supports Tez with its Hadoop distribution HDP 2.1 and is currently offering a tech preview of Spark, likes where Cascading is headed and wanted to get ahead of customer demands, according to John Kreisa, vice president of corporate strategy for Hortonworks.

“Given the clear adoption patterns we’re seeing with Hadoop around building various data centric apps and the desire to put those apps into production, it made sense to deepen the relationship [with Concurrent],” Kreisa tells Datanami. “We know our customers want to develop apps. We know Cascading is popular–we see it in our user base. So it just made sense for us to take this next step and include it directly in the platform to accelerate the adoption of Hadoop.”

Previously, the two companies worked together to certify the integration and testing of HDP and Cascading, but it was up to customers to obtain the Cascading code and ensure that it worked. Under the expanded pact, Hortonworks will distribute the Cascading software development kit (SDK) as part of HDP and provide level one and level two technical support for customers; Concurrent will provide level three support.

In early June, Hortonworks will include support for the forthcoming release of Cascading 3.0 as a tech preview in the HDP sandbox environment. It will become generally available (GA) in late summer or early fall, says Tim Hall, vice president of product management for Hortonworks.

Hortonworks is a big believer in how Concurrent is building support for Tez into Cascading 3.0. “Tez is a significant leap forward,” Hall says. “It’s one of the critical things Hortonworks has been investing in from the open community for Hadoop, which is moving this from a batch-centric, mostly serialized approach to accessing data on Hadoop–that was MapReduce 1–and shifting this to a mixed workload environment that runs on YARN.”

The recently launch of HDP 2.1, which enabled Hive to either use the legacy MapReduce execution engine or the new Tez engine, is Hortonworks’ contribution. “Concurrent is going to follow that lead and go down that path as well with the Cascading SDK,” Hall said. “We will likely invest in working with the open source community to move some of these other tools from legacy MapReduce 2 to the next generation, which is Tez.”

Cascading 3.0 will also support Apache Spark, the in-memory framework that’s gaining a ton of momentum as yet another replacement for MapReduce. Hortonworks, whose developers largely spearheaded the development of Tez, is taking a bit of a wait-and-see approach regarding Cascading and its Spark prospects.

“One of the interesting things about the Cascading SDK is it does provide some additional libraries on top of the Java API. One of the ones we’re most interested in is Scalding libraries, which provides us a Scala interface,” Hall says. “Obviously having that access point, and seeing what the interest is in the community of Scala and the relationship of that Scalding SDK and how it does or does not work with Spark, will be something we’ll be looking at very closely with our customers.”

Cloudera Launches Data Scientist Certification

Mar 28, 2014
John Rath

http://www.datacenterknowledge.com/archives/2014/03/28/cloudera-launches-data-scientist-certification/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+DataCenterKnowledge+%28Data+Center+Knowledge%29

Concurrent’s application platform for big data is certified with the Intel Data Platform, Datameer 4.0 introduces “flip side” view for big data analytics workflow, and Cloudera addresses the industry need for big data skills with a certification program and big data challenge for data scientists.

Cloudera launches certified data scientist program. Cloudera announced the industry’s first hands-on data science certification, called Cloudera Certified Professional: Data Scientist (CCP:DS). Consisting of an essentials exam and data science challenge, the new program helps developers, analysts, statisticians, and engineers get experience with relevant big data tools and techniques and validate their abilities while helping prospective employers identify elite, highly skilled data scientists. The demand for data scientists is at an all-time high, and they possess a rare combination of engineering capabilities, statistical skills, and subject matter expertise that is difficult to find. Once candidates have passed the Data Science Essentials exam, they must then successfully complete a Cloudera Data Science Challenge, offered twice annually. Cloudera’s second Data Science Challenge opens on March 31 and is about detecting anomalies in Medicare claims.

Concurrent Cascading certified for Intel Big Data platform. Enterprise big data application platform company Concurrent announced the compatibility of its Cascading application platform for big data applications using Hadoop with the Intel Data Platform. As Hadoop applications leverage an enterprise’s most valuable asset – its data, a secure infrastructure is crucial. Enter the Intel Data Platform, which incorporates and builds on an open source software platform to provide distributed processing and data management for enterprise applications analyzing massive amounts of diverse data. This certification offers the right fit for businesses seeking to make the most of their Big Data by combining the productivity benefits of Cascading with the security, performance and manageability advantages of Intel Data Platform. “Today’s announcement demonstrates Concurrent’s continued mission of making enterprise data application development easy for the masses and enabling businesses to operationalize their data,” said Gary Nakamura, CEO at Concurrent. “Now our users can rely on the Intel Data Platform’s next-generation analytics and secure infrastructure combined with all the power and benefits of Cascading to drive business differentiation.”

Datameer introduces ‘Flip Side’ view for big data analytics. Big data analytics company Datameer introduced Datameer 4.0 and the first end-to-end big data analytics workflow that allows for visual insights at every step of analysis. With Datameer’s new “flip side” view to its spreadsheet interface, users can get instant visual feedback at any point in the big data analytics process. This allows them to check initial results and adjust appropriately as they go, further shortening the time to insight and freeing up time for teams to tackle bigger challenges. The new product consists of a visual profile of the data after every step of the process, including integration, enrichment, transformation and advanced analytics. “We are consistently pushing the envelope to make big data analysts as productive as possible, and studies show the fastest way to digest information is visually,” said Stefan Groschupf, CEO of Datameer. “With Datameer 4.0, analysts no longer need to wait until the final visualization to gain insights from their data. This new paradigm means companies will realize meaningful ROI on their big data analytics projects faster than ever before.”