The Business of Data
Jelani Harper, Dataversity
April 24, 2014
A health insurance company leverages its massive amounts of data to customize customer service for patients by conducting product assessments and medication reconciliations.
An exceedingly large and well-known power supply company makes substantial additions to the Internet of Things by manufacturing jet engines and windmills, and monitoring the output of their data to schedule maintenance.
An automobile manufacturing company merges manufacturing telemetry with the telemetry of its vehicles to optimize assembly line production.
The commonly found confluence of Data Scientists, engineers, and operations personnel of any number of industries designs recommendation engines for their web sites to boost traffic.
Regardless of the industry, regardless of the area of specialty of a particular enterprise, a number of organizations are involved in the business of Data – the utilization of data and data-driven processes to operate and improve their businesses.
At the core of this process of automating and operationalizing data to create meaningful action increasing business value is application building, a necessity for Big Data, conventional data, or any degree of integration of the two—and without which data might derive insight, but can’t actually do any of the aforementioned processes that is revolutionizing the way business is practiced in the 21st Century:
“It is very clear that the broader world is moving towards enterprise data applications,” observed Chris Wensel, founder and Chief Technology Officer of Concurrent, whose open source application building framework Cascading is receiving 100,000 downloads a month with 11 percent month-over-month growth and thousands of deployments.
“It’s about businesses taking their assets and their data and building applications that enhance their business processes. It’s very clear that the market is moving on to the next level of maturity, and it has been doing so at a very rapid pace during the past 18 months.”
Without Data Science
Much of the current hype revolving about Data Science and the shortage of viable Data Scientists pertains to the myriad responsibilities these workers have. A large part of that responsibility has to do with integrating architecture from newer technologies (such as Hadoop and various NoSQL offerings) and their forms of data with legacy systems in a way that can produce meaningful action for the business. Part of the appeal of using application building platforms such as Cascading is that it is a Java-based API—which means that engineers and developers can utilize whatever Java compatible language they’ve been using for years to develop applications utilizing all forms of data, big or otherwise.
Subsequently, there is a whole set of skills necessary for manipulating Big Data platforms such as Hadoop and others related to Data Science that is no longer needed to create actionable applications from data. Organizations don’t necessarily need to wait for Data Scientists to graduate and can instead concentrate on solving their business problems in a much more expedient and economical fashion—while utilizing current personnel. Wensel acknowledged the impact of Cascading’s single API approach:
“What does the enterprise have on the bench? Java developers. They have to learn new APIs [to build data applications] and understand the business problems. Where Cascading comes in is we normalize that API. We give them one API to learn so they can solve multiple business problems and focus on the quality of the business stack underneath and not focus on anything else, while continuing to leverage their existing talent.”
In addition to considerably reducing the skills necessary to manipulate Big Data and transform it with applications that fulfill business objectives, application building platforms also produce a degree of reliability and automation that is necessary for applications to function accordingly. Although scripts such as R and others are useful for creating analytics and building applications, they frequently lack the scalability to handle Big Data sets.
The true advantage to application building is that it effectively operationalizes data, and goes a step further than providing the conventional insight of Business Intelligence or analytics to actually provide action. There are a number of critical prerequisites that must be accounted for to consistently operationalize data in a manner that is reliable enough so that crucial business processes can hinge upon them. The action must be continuous, self-sustaining, and ideally function in a way so that even laymen can use it.
Without a comprehensive application building platform, Data Scientists and IT are simply linking together a variety of disparate technologies in a way that may cause latency or, even worse, provides the potential for malfunctioning. With such a platform, IT and operations personnel can utilize a single framework in a language with which they are already familiar to provide that perpetual reliability needed to automate valuable business processes. And, with application building monitoring tools such as Concurrent’s Driven, which provides unparalleled insight and specificity into the nature of applications and revolutionizes the time frame required to address latency issues and malfunctions, those same personnel can ensure that those processes are actually optimized. Concurrent CEO Gary Nakamura commented that:
“The skills cap is broadly systemic across most organizations. They all want to access the data, and they all want to build things on top of new data stores like Hadoop, but the challenge is that they’ve been using SQL for the last 30 years or they’ve been building Java applications for the last 20 years. There is a situation where only the elite can do low level things.”
That’s where Cascading provides the instruction and the interfaces so the rest of the organization with different levels of skills in front of the enterprise can leverage the data in Hadoop and operationalize it so that they can put applications into business processes.
In addition to facilitating the business logic and operationalizing processes that are integral for professionals to maximize business value, application building also plays a crucial role for integrating a variety of data sources and management tools. As Big Data technologies mature, there is an emerging trend for the enterprise to utilize platforms such as Hadoop to store all of their data and to integrate Big Data with traditional proprietary data to conduct comprehensive analytics on aggregated data sets.
A similar capability exists for data application frameworks such as Cascading, which also has an integration API in which developers or Data Scientists can utilize this single tool to access various types of data, programming languages and platforms to manipulate data that is pivotal for a particular application. The result is that developers, operations personnel and engineers can now focus on business logic regardless of the different technologies involved, since Cascading has a specific API for integration. Wensel discussed the integration capabilities of Cascading in the following case study:
“Why can’t you just read data from Oracle and write it in Cassandra with a single application, and not have five applications to do that and have that data be stale the moment you actually get your hands on it? Cascading is designed to solve that problem; it’s first-class. It’s focused on solving any kind of problem where integration with existing legacy systems is a priority.”
Better than BI?
The reality of the influence and influx of data on business and operational processes is that in more and more instances, organizations are actually becoming involved in the business of data. Data is no longer simply used to enhance their processes, but instead is actually sold as a product or service—such as is the case with geo-spatial data or gene-sequencing information. In such situations, data applications are a requisite for the business and effectively represent the vital means by which such business is provided and for how it functions.
In this respect, data application building represents the evolution of BI. Organizations are transcending the ability to simply conduct analysis with data, and are instead using that data to produce specific action that creates business value:
“That’s truly what’s happening here,” Wensel reflected. “BI is being overrun by people who are actually delivering data as a product based on data. They’re taking raw materials, and delivering things. BI is a subset of that.”
In which case, data application building is the larger picture.