All posts by KIm Loughead

Concurrent, Inc. Continues to Expand Supported Ecosystem to Deliver Deep Visibility and Insight for Hadoop Applications, Announces Driven 1.3

August 25, 2015News, Press ReleasesConcurrent, Driven APM, monitoring big data, monitoring hadoop, performance management hadoopKIm Loughead

New Release Offers Advanced Team Collaboration, Support for Apache Hive, Cascading 3.0 and Apache Tez, Support for Hadoop Applications

SAN FRANCISCO – Aug. 25, 2015 – Concurrent, Inc., the leader in Big Data application infrastructure, today announced the latest release of Driven, the industry’s leading application performance management product for monitoring and managing Hadoop applications. Driven is built to address the challenges of business-critical Big Data application development and deployment, delivering control and performance management for enterprises seeking to achieve operational excellence on Hadoop.

Driven offers enterprise users – developers, operations and lines of business – unprecedented visibility into applications written in Cascading, Scalding, Cascalog, Apache Hive and MapReduce. It provides deep operational insights, search, segmentation and visualizations for rapid troubleshooting and performance management. To achieve this, Driven collects rich operational metadata in a scalable data repository, enabling users to isolate, control and report on almost any topic relevant to their business, such as application SLAs, KPIs and data lineage.

The latest version of Driven includes:

Advanced application performance analytics: Customizable application views include key statistics about application performance over time. This new enhancement also includes anomaly detection – the ability to go back in time to determine when the anomaly happened and view the current environment to define the cause of the problem.

Deeper collaboration and sharing: This ability to share a created customized analytic, application or status view ensures that teams are referencing the same data when troubleshooting a problem. When sharing the view, users can select whether to share only with other teams members or with any individual.

Enhanced SLA management: Users now have the option to set SLA thresholds and alerting. For example, users can set duration thresholds to report on all applications that exceed their allotted run-time.

Plug-in agent for Apache Hive and Map Reduce: Driven now supports Apache Hive and MapReduce. With Driven’s agent technology, enterprises can seamlessly and transparently collect all the operational intelligence for Apache Hive and MapReduce jobs and tasks, delivering all the rich capabilities and operational analytics offered by Driven.

Cascading 3.0 and Apache Tez Support: This new support enables Cascading 3.0 users to leverage all the capabilities of Driven to manage and monitor applications running on multiple compute fabrics.

Driven is a proven performance management solution that enterprises can rely on to deliver against their data strategies. Benefits of Driven include accelerated application development cycles, immediate application failure diagnosis, improved application performance, easier audit reporting and reduced cluster utilization costs.

Driven can be accessed now for free at http://drivenio.staging.wpengine.com/choose-trial.

Supporting Quotes

“Enterprise needs have not changed, and as Hadoop pushes further into the mainstream, running business critical data processes has challenges. Enterprises are grappling with basic visibility, data governance, compliance and performance management. Driven arms enterprises with the right solution to deliver against their big data strategies and move plans forward.”

– Gary Nakamura, CEO, Concurrent, Inc.

Supporting Resources

Driven: http://drivenio.staging.wpengine.com
Company: http://concurrentinc.com
Contact us: http://concurrentinc.com/contact
Twitter: http://twitter.com/concurrent
LinkedIn: http://www.linkedin.com/company/concurrent-inc.

About Concurrent, Inc.

Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 500,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

MeetUp | Cascading: A Java Developer’s Companion to the Hadoop World – Nov 11, 2014

November 11, 2014UncategorizedKIm Loughead

Sign-up here: http://www.meetup.com/sfjava/events/200779752

When:
Tuesday, November 11, 2014
6:00 PM

Where:
Twitter Office
1355 Market St #900
San Francisco, CA 94103

What:
Amid all the hype and investment around Big Data technologies, many Java software engineers are asking what it takes to become big data engineers. As Java professionals, towards which path shall I steer my career?

Join Dhruv Kumar as he introduces Cascading, an open source application development framework that allows Java developers to build applications on top of Hadoop through its Java API. We’ll provide an overview of the application development landscape for developing applications on Hadoop and explain why Cascading has become so popular, comparing it to other abstractions such as Pig and Hive. Dhruv will also show you how Java developers can easily get started building applications on Hadoop with live examples of good ‘ole Java code.

About Dhruv Kumar
Dhruv Kumar is a Solutions Architect at Concurrent Inc. and has over six years of diverse software development experience in Big Data, Web and High Performance Computing applications. Prior to joining Concurrent, he worked at Terracotta as a Software Engineer. He has a MS degree in Computer Engineering from the University of Massachusetts-Amherst.

Concurrent, Inc. Delivers Performance Management for Apache Hive and MapReduce Applications

November 6, 2014Press ReleasesKIm Loughead

Everyone is in the Business of Data; New Release of Driven Delivers Deep Visibility and Insight for Data Applications

SAN FRANCISCO – Nov. 6, 2014 – Concurrent, Inc., the leader in data application infrastructure, today introduced a new version of Driven, the industry’s leading application performance management product for the data-centric enterprise. Driven is purpose-built to address the challenges of enterprise application development and deployment for business-critical data applications, delivering control and performance management for enterprises seeking to achieve operational excellence.

Driven offers enterprise users – developers, operations and line of business – unprecedented visibility into their data applications, providing deep insights, search, segmentation and visualizations for service-level agreement (SLA) management – all while collecting rich operational metadata in a scalable data repository. This allows users to isolate, control, report and manage a broad range of data applications, from the simplest to the most complex data processes. Driven is a proven performance management solution that enterprises can rely on to deliver against their data strategies.

The latest version of Driven introduces:

Deeper Visualization into Data Apps: Enhanced support allows users to debug, manage, monitor and search applications more effectively and in real time. Users can also track and store complete history of each application’s performance and operational metadata.
Powerful Search: Fast and rich search capabilities enable users take the guess work out of managing Hadoop applications. Driven provides greater control over managing user data processing. It quickly identifies problematic applications and the associated owners, and finds and compares specific applications with previous iterations to ensure that all applications are meeting SLAs.
Operational Insights for SLA Management: Users can now visualize all applications over customizable timelines to manage trending application utilization. Driven quickly segments applications by name, user-defined metadata, teams and organizations for deeper insights.
Segmentation for Greater Manageability: New segmentation support provides greater insights across all applications. Users have the ability to segment applications by tags, names, teams or organization, and easily track for general Hadoop utilization, SLA management or internal/external chargeback.
Metadata Repository: A scalable, searchable, fine-grained metadata repository easily captures end-to-end visibility of data applications, as well as related data sources, fields and more. By retaining a complete history of applications’ operational telemetry, enterprises can leverage Driven for operational excellence from development to production to compliance-related requirements.
Integration with Existing Systems: Users can leverage the vast capabilities of Driven and deliver runtime metrics and notifications to existing enterprise monitoring systems.
Additional Framework Support: In addition to Cascading, Scalding and Cascalog applications, Driven now supports Apache Hive and native MapReduce processes, allowing enterprises to leverage Driven’s capabilities across a wide variety of application frameworks.

Pricing and Availability
Driven is available as a free service on cascading.io and licensable for production use as an annual subscription. Also, Driven will soon be available as an enterprise deployment. Sign up to be notified for when the self-hosted version becomes available.

Supporting Quotes
“Our developers build applications on Cascading and rely on these applications to run our cloud-based platform for email marketing solutions. With Driven, our developers have unmatched operational visibility and control across all Cascading applications – including real-time monitoring, history and performance tracking over time. We already see the value and promise of Driven, as it’s allowing us to drive differentiation through our data and manage our data applications more efficiently, which, in turn, delivers on our mission to transform email marketing, maximize sales and increase revenue streams for our customers.”
– Johannes Alkjær, lead architect, Mojn

“We remain focused on Driven with this expansion beyond the Cascading ecosystem with the aim to help enterprises make the most of their existing talent and to simplify data project maturity levels – from development through to production. With this release, Driven continues to push the boundaries of operational visibility and performance across Cascading applications and sets a new precedent with new support for Apache projects like Hive.”
-Chris Wensel, founder and CTO, Concurrent, Inc.

Supporting Resources

Driven: http://www.cascading.io/driven
Cascading: http://cascading.org
Company: http://concurrentinc.com
Contact us: http://concurrentinc.com/contact
Twitter: http://twitter.com/concurrent
LinkedIn: http://www.linkedin.com/company/concurrent-inc.
YouTube: http://www.youtube.com/getcascading

About Concurrent, Inc.
Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 200,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

Media Contact
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

###

MeetUp | Elasticsearch Meetup at Twitter – October 15, 2014

October 7, 2014EventsKIm Loughead

Sign-up here: http://meetu.ps/2zTK24

When:
Wednesday, October 15, 2014
6:30 PM

Where:
Twitter NYC office
340 Madison, 6th Floor
New York, NY

What:
Please join us for our October Elasticsearch meetup at Twitter featuring speakers from Elasticsearch, Found AS, Concurrent, and food and drinks.

Speaker Bios:
Elasticsearch: Costin Leau is an engineer at Elasticsearch, leading the Hadoop efforts. An open-source veteran, Costin led various Spring projects and authored an OSGi spec. Speaker at various editions of EclipseCon/OSGi DevCon, JavaOne, Devoxx/Javapolis, JavaZone, SpringOne, TSSJS on Java/Spring/Hadoop related topics.

Found AS: Konrad Beiske is a senior software engineer at Found AS, a company whose primary product is a hosted Elasticsearch service. Konrad holds a Master’s Degree in Computer Science, with an emphasis on databases and distributed systems. He has been focusing on Elasticsearch during the past two years. Konrad gives presentations about Elasticsearch and distributed systems at meetups and conferences, and he writes regularly on the Foundation blog.

Konrad will be presenting on Elasticsearch in Production — things to think about before going into production with your Elasticsearch implementation.

Concurrent: Supreet Oberoi is the Vice President of Field Engineering at Concurrent. Prior, as the Director of Big Data application infrastructure for American Express, he led the development of use cases for fraud, operational risk, marketing and privacy on Big Data platforms. He is a holder of multiple patents in data engineering and has also had leadership roles at Real-Time Innovations, Oracle and Microsoft.

Supreet will be presenting on “Large scale log processing with Cascading & Elastic Search”. Elasticsearch is becoming a popular platform for log analysis with its ELK stack: Elasticsearch for search, Logstash for centralized logging, and Kibana for visualization. Complemented with Cascading, the application development platform for building Data applications on Apache Hadoop, developers can correlate at scale multiple log and data streams to perform rich and complex log processing before making it available to the ELK stack. Join Supreet Oberoi from Concurrent, the people behind Cascading, as he explains how Cascading enables efficient and robust development of data-applications for Hadoop. In addition, he will talk about the challenges in operationalizing large-scale log processing applications on which businesses can depend.

Concurrent, Inc. to Present at Upcoming Big Data and Developer Industry Events

October 7, 2014Press ReleasesKIm Loughead

Supreet Oberoi and Ryan Desmond to Deliver Sessions on Cascading and Cascading Lingual at Silicon Valley Code Camp and Big Data TechCon

SAN FRANCISCO – Oct. 7, 2014 – Concurrent, Inc., the leader in data application infrastructure, today announced that Ryan Desmond, solutions architect, will present at the ninth annual Silicon Valley Code Camp, taking place Oct. 11-12 in Los Altos Hills, Calif., and Supreet Oberoi, vice president of field engineering, will deliver two sessions at Big Data TechCon, taking place Oct. 27-29 in Burlingame, Calif.

At Silicon Valley Code Camp, Ryan will provide an introduction to Cascading, the most widely used and deployed application development framework for building data-centric applications that enables organizations to operationalize their data and solve business problems.

At Big Data TechCon, Supreet will deliver a talk on how organizations can increase Hadoop utilization with their data warehouses using Cascading Lingual, an open source project that allows users to utilize existing SQL skills to instantly create and run applications on Hadoop. Additionally, Supreet will discuss Cascading and how organizations can leverage the popular application development framework to future-proof big data investments against emerging technologies.

Concurrent Presentations At-A-Glance

Silicon Valley Code Camp
What: “Application Development on Hadoop Using Cascading”
Who: Ryan Desmond, solutions architect, Concurrent, Inc.
When: Sunday, Oct. 12 at 1:15 p.m. PT
How: Register at https://www.siliconvalley-codecamp.com/Account/CreateAccount

Session Description
Cascading is the most widely used and deployed Java-based application development framework for building Big Data applications on Apache Hadoop. This open source, enterprise development framework allows developers to leverage their existing skillsets such as Java, SQL, Scala and more, to create enterprise-grade applications on Apache Hadoop without having to think in MapReduce, or scripting query languages like Pig and Hive. In this session, Ryan will provide an introduction to Cascading and dive into using it to build applications.

Big Data TechCon
What: “Increase Hadoop Utilization With Your Data Warehouse”
Who: Supreet Oberoi, vice president of field engineering, Concurrent, Inc.
When: Wednesday, Oct. 29 at 11:45 a.m. PT
How: Register at http://www.bigdatatechcon.com/registrationdetails.html

Session Description
The Hadoop ecosystem is an effective platform for conducting data-engineering tasks. However, mission-critical BI applications continue to run on traditional data warehouse platforms. In such enterprise use cases, Hadoop is emerging as a technology that augments existing platforms for processing these data-intensive tasks, requiring organizations to not only migrate workloads onto Hadoop for processing, but also extract data for reporting and analysis.

In this session, Supreet will demonstrate how to leverage SQL to improve utilization of Hadoop with existing data sources. Attendees will learn how to utilize Cascading and Cascading Lingual for Big Data application development, how to integrate Hadoop with an enterprise data warehouse through one SQL statement, and how to quickly integrate BI tools with Hadoop.

What: “Future-Proof Your Big Data Investments With Cascading”
Who: Supreet Oberoi, vice president of field engineering, Concurrent, Inc.
When: Wednesday, Oct. 29 at 3:45 p.m. PT
How: Register at http://www.bigdatatechcon.com/registrationdetails.html

Session Description
New computation fabrics with different interfaces and design patterns are continually being introduced into the market. As a result, if companies wish to leverage new innovations in Big Data platforms, they must constantly break and rebuild their operating models to keep up.

In this session, Supreet will discuss how organizations can future-proof Big Data investments against these emerging technologies with Cascading. Supreet will demonstrate how to build applications on emerging fabrics like Tez and Spark with existing skillsets, as well as give a tour of the Cascading community – a robust ecosystem that extends the power of Cascading applications with various integrations, dynamic programming languages and data source connections.

Supporting Resources
● Company: http://concurrentinc.com
● Cascading: http://cascading.org
● Driven: http://www.cascading.io/driven
● Contact us: http://concurrentinc.com/contact
● Twitter: http://twitter.com/concurrent
● LinkedIn: http://www.linkedin.com/company/concurrent-inc
● YouTube: http://www.youtube.com/getcascading

###

Media Contacts
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Bossie Awards 2014: The best open source big data tools

September 29, 2014NewsKIm Loughead

Steve Nunez, InfoWorld
Sep 29, 2014
http://www.infoworld.com/article/2688074/big-data/big-data-164727-bossie-awards-2014-the-best-open-source-big-data-tools

InfoWorld’s top picks in distributed data processing, data analytics, machine learning, NoSQL databases, and the Hadoop ecosystem

Cascading
The learning curve for writing Hadoop applications can be steep. Cascading is an SDK that brings a functional programming paradigm to Hadoop data workflows. With the 3.0 release, Cascading provides an easier way for application developers to take advantage of next-generation Hadoop features like YARN and Tez.

The SDK provides a rich set of commonly used ETL patterns that abstract away much of the complexity of Hadoop, increasing the robustness of applications and making it simpler for Java developers to utilize their skills in a Hadoop environment. Connectors for common third-party applications are available, enabling Cascading applications to tap into databases, ERP, and other enterprise data sources.

— Steven Nunez

Cascading on Apache Tez — Delivering on the promise of next generation compute

September 23, 2014NewsKIm Loughead

Gary Nakamura, Concurrent, Inc.
Sep 23, 2014
http://hortonworks.com/blog/cascading-on-apache-tez

Concurrent Inc. is a Hortonworks Technology Partner and recently announced that Cascading 3.0 now supports Apache Tez as an application runtime. Cascading is a powerful development framework for building enterprise data applications on Hadoop and is one of the most widely deployed technologies for data applications, with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in data application development on Hadoop.

In this guest blog, Gary Nakamura, CEO at Concurrent, talks about Concurrent’s recent milestone and the road ahead.

The “developer release” of Apache Tez is here, and we are happy to re-affirm our support for the community and the project.

Concurrent, the team behind Cascading, would like to add our congratulations to the Apache Tez community on achieving this milestone. This is an important project for the broader ecosystem, and we expect to see Tez continue to move forward quickly.

What Cascading on Tez Means for ISVs
It’s early days for Cascading on Apache Tez. Simultaneously delivering on performance, scale and reliability is no small feat, but we see Hortonworks and the Apache Tez community delivering on the promise of a next generation compute engine.

Cascading and Tez together represent another important milestone, providing users and independent software vendors (ISVs) the flexibility to quickly build their data apps and then choose the appropriate compute engine for the business problem at hand (in-memory, batch mode, streaming or otherwise).

This week we announced that the latest Cascading 3.0 WIP adds Apache Tez as a supported runtime platform. This was a significant milestone for Cascading in its own right as we delivered a pluggable query planner to make this support possible. With this release, Cascading users can start testing their existing applications on the Apache Tez compute engine.

Thousands of enterprises around the world will welcome a more efficient, high-performance compute engine – one that delivers the reliability and scale that they are accustomed to and one that will allow them to easily and seamlessly migrate their business-critical data applications. Tez has promised this and that commitment stands to benefit the entire Hadoop ecosystem.

What’s Next?
From here, we will work closely with the Tez community to run performance and scalability tests, and capture feedback from new and existing users. We will also work with the broader Cascading community to migrate Scalding, Cascalog, Lingual and Pattern to Apache Tez.

This is a big win for the community, our contributors, partners, ISVs and for enterprises driving their data strategy and next-generation data applications on Hadoop.

We share an unwavering commitment to developer productivity, ease of deployment, ease of manageability, and above all, innovation for the future of data app development.

At the end of the day, we are all in the data business.

Resources for Cascading 3.0 on Apache Tez

Download Cascading 3.0 WIP and its documentation: http://www.cascading.org/wip
Cascading on Apache Tez Notes: https://github.com/cwensel/cascading/tree/wip-3.0/cascading-hadoop2-tez
Sample Applications: https://github.com/Cascading/cascading.samples/tree/wip-3.0

Concurrent, Inc. to Present at DataWeek + API World 2014

September 10, 2014Press ReleasesKIm Loughead

Supreet Oberoi to Deliver Sessions on Using Cascading to Build Applications for IoT, and Creating Complex Machine Scoring Applications with Cascading Pattern

SAN FRANCISCO – Sept. 10, 2014 – Concurrent, Inc., the leader in data application infrastructure, today announced that Supreet Oberoi, vice president of field engineering, will deliver two sessions at the third annual DataWeek + API World 2014, taking place Sept. 16-17 in San Francisco. This two-day conference and expo is the largest event for engineers and executives to discuss the role of data and API innovation on business, technology and society.

As the Internet of Things (IoT) gains momentum, and the number of connected devices exponentially increases, so too, does the generation of Big Data. However, with the vast troves of complex Big Data being generated by these connected devices, the IoT is becoming less of a mega trend and more of a mega data problem.

At DataWeek + API World 2014, Supreet will deliver a talk on Cascading, the most widely used and deployed application development framework for building data-driven applications, and how it supports the convergence of IoT and Big Data. Additionally, Supreet will speak on Cascading Pattern, a standards-based scoring engine that leverages Cascading and enables data scientists to use large amounts of small data produced by smart devices and run predictive data models at scale.

Concurrent Presentations At-A-Glance

What: “Using Cascading to Build Data-Driven Applications for the Internet of Things”
Who: Supreet Oberoi, vice president of field engineering, Concurrent, Inc.
When: Wednesday, Sept. 17 at 10:45 a.m. PT
How: Register at http://dataweek.co/register

Session Description
With affordable Micro-Electro-Mechanical Systems (MEMS) manufacturing economics, and the networking protocols solving the kinks to connect low-powered devices, we are on the verge of unleashing the promise of ubiquitous computing with the Internet of Things. However, data-driven applications built to provide context and awareness for these new use cases will have their own unique constraints. While these applications will have to adapt to the optimal technologies of today, they must also be prepared to quickly leverage new innovations in analytics as – and when – they come.

In this session, Supreet will discuss the challenges in mining machine data, and how the Cascading development framework can be used to build data applications – ultimately fulfilling all constraints with IoT applications.

What: “Data Science at Scale! Creating Complex Machine Scoring Applications with Cascading Pattern”
Who: Supreet Oberoi, vice president of field engineering, Concurrent, Inc.
When: Wednesday, Sept. 17 at 4 p.m. PT
How: Register at http://dataweek.co/register

Session Description
Cascading Pattern is an open source project that takes models trained in popular analytics frameworks, such as SAS, Microstrategy, SQL Server, etc., and runs them at scale on Apache Hadoop. With Pattern, developers can use a Java API to create complex machine learning applications, such as recommenders or fraud detection. Pattern effectively lowers the barrier of adoption to Apache Hadoop for developers because developers can use existing skill sets to immediately begin building these complex applications.

In this presentation, Supreet will provide sample code that will show applications using predictive models built in SAS and R, such as anti-fraud classifiers. Additionally, Supreet will compare variations of models for enterprise-class customer experiments.

About the Speaker
Bringing more than 20 years of enterprise software experience in successfully developing transformative information technologies, Supreet Oberoi is vice president of field engineering for Concurrent, Inc.

Previously, Supreet served as Big Data technical evangelist and director of Big Data technical delivery for American Express. Combining business acumen with technical insights and strong execution skills, he developed reference architectures and new enterprise-level capabilities with the Hadoop stack using Map Reduce, HBase, Hive, Solr, Mahout, sqoop and many proprietary “big data” technologies. Prior to American Express, Supreet held several vice president, director and senior-level positions at Agile Software, oneREV, Oracle and RTI.

Supporting Resources

Company: http://concurrentinc.com
Cascading: http://cascading.org
Driven: http://www.cascading.io/driven
Contact us: http://concurrentinc.com/contact
Twitter: http://twitter.com/concurrent
LinkedIn: http://www.linkedin.com/company/concurrent-inc
YouTube: http://www.youtube.com/getcascading

About Concurrent, Inc.
Concurrent, Inc. is the leader in data application infrastructure, delivering products that help enterprises create, deploy, run and manage data applications at scale. The company’s flagship enterprise solution, Driven, was designed to accelerate the development and management of enterprise data applications. Concurrent is the team behind Cascading, the most widely deployed technology for data applications with more than 175,000 user downloads a month. Used by thousands of businesses including eBay, Etsy, The Climate Corp and Twitter, Cascading is the de facto standard in open source application infrastructure technology. Concurrent is headquartered in San Francisco and online at http://concurrentinc.com.

###

Media Contacts
Danielle Salvato-Earl
Kulesa Faul for Concurrent, Inc.
(650) 922-7287
concurrent@kulesafaul.com

Webinar | Developing Applications on Hadoop with Scalding — Sep 18, 2014

August 27, 2014EventsKIm Loughead

Date: Thursday, September 18, 2014
Time: 9am Pacific
Register at: http://info.hortonworks.com/WC_YarnReadySeries_Scalding_09.18.14_Webinar

At the center of many data-driven businesses is Scalding. Scalding is a Scala library based on the Cascading framework and is designed to simplify application development on Hadoop and YARN.

Please join us as Jonathan Coveney, Sr. Software Engineer at Twitter, teaches us about Scalding, and how Twitter uses it to perform a variety of tasks such as traffic quality measurement, ad targeting, market insight, and more.

Eight tips for resilient big data apps

August 25, 2014NewsKIm Loughead

Howard Solomon, IT World Canada
August 25, 2014
http://www.itworldcanada.com/post/eight-tips-for-resilent-big-data-apps

One of the problems with big data applications is they have to handle big data — we’re talking huge data sets.

As Supreet Oberoi, vice-president of Concurrent Inc. , a maker of the Cascading application development framework, points out in a column for GigaOM, if they aren’t tough enough they may fail in production.

The solution is to build resilient, well-tested applications before they go out the door. “This is a matter of philosophy and architecture as much as technology,” he says, in putting forward eight tips for building big data apps that can hold up to demanding environments:

–Define a blueprint for resilient applications, with a systemic enterprise architecture and methodology for your company approaches big data applications.

This means answering a number of questions, including where your current architecture is failing;

–Size shouldn’t matter. Apps have to be tested with small-scale datasets, then fail or take too long with larger ones. They have to handle all sizes of data;

–Have a transparent process for finding problems, so developers and operations staff can diagnose and respond to problems when they happen;

–Abstraction and simplicity work. “Resilient applications tend to be future-proof because they employ abstractions that simplify development, improve productivity and allow substitution of implementation technology,” he writes. Developers should be able to build apps without being mired in the implementation details. Then data scientists should b able to use the app and access any type of data source;

–Build in security, auditing and compliance;

–Test-driven development should provide the ability to step through the code, establish invariants, and utilize other defensive programming techniques;

–Be portable. Applications should be designed to run on a variety of platforms and products;

–No black arts. Code should be shared, reviewed and commonly owned by multiple developers, not dependent on one person.

“If companies follow these eight rules, they will create resilient, scalable applications that allow them to tap into the full power of big data,” Oberoi writes.

How many of these rules does your developer team follow — or break?