Telenav has built a sophisticated machine learning framework using Cascading. Complex analytics algorithms were broken down into a collection of basic data flows and integrations. Developers build upon these reusable process flows to quickly develop new applications.
To meet these goals, TeleNav chose Cascading, an open source data processing, integration and process scheduling API for creating Enterprise-grade data and analytics applications on Apache Hadoop. Cascading solved all of the immediate manageability issues with MapReduce jobs and parallelized processing so complex, compute-intensive jobs finish in a reasonable amount of time. In contrast to other options, Cascading also had better integration with Java, automatic process flow optimization and the ability to easily transfer data to traditional databases and HBase.
TeleNav used Cascading to develop all the basic flows in its machine learning framework and integrate to a variety of data sources. Cascading ‘Pipes’ and ‘Taps’1 are components of data processing flows, and Cascading’s ‘Tuple’2 concept is used extensively for data handling. The developers also implemented optimized binary files to store interim data which can be then be easily transferred or accessed by multiple flows.
This Cascading processing model allowed TeleNav to create sophisticated, reusable machine learning frameworks that are independent of the integration complexities that normally bottleneck project development. The result is a flexible, scalable system capable of supporting business needs as they arise.
“Cascading was used to create all the basic flows and data integrations in our machine learning framework,” noted Pramod Narasimha, Senior Software Engineer at TeleNav . “We can now build on top of these reusable process flows easily and quickly. Cascading manages and optimizes the MapReduce jobs automatically.”
Using Cascading, TeleNav has built a flexible, sophisticated machine learning framework that will be the basis for future development.
1 A ‘Pipe’ is a data process flow, a ‘Tap’ is a connection to a data source.
2 A ‘Tuple’ can be considered the same as a database record where every value is a column in that table.
TeleNav is a global leader in location-based applications delivered via a mobile device. The company has won multiple industry awards and was one of the first to launch a GPS navigation and mobile workforce management service on a cell phone in North America.
TeleNav employs a unique architecture where the cell phone is a thin client that communicates with a powerful back-end server. The computational and storage capabilities of TeleNav’s industrial-strength systems provide a richer user experience than self-contained GPS devices. The company acquires, processes and publishes local content that includes hundreds of millions of entities and supports location-based services in US and international markets. They needed to be able to maintain, de-dupe, enrich and classify data efficiently using sophisticated machine learning techniques to provide high-value services.
To process incoming data, TeleNav chose Hadoop for its compute power and wrote raw MapReduce jobs to implement the company’s unique algorithms. However, the individual MapReduce jobs were difficult to manage. TeleNav needed to be able to develop a framework for machine learning algorithms on MapReduce architecture that was flexible, made data processing flows easy to manage and would scale to support their growing business.