Using Cascading deployed on Amazon Elastic MapReduce, Ion Flux created a scalable gene sequencing and realignment algorithm to analyze massive amounts of data. Ion Flux is able to readily hire Java programmers instead of MapReduce experts and focus on genomic problems rather than infrastructure technicalities.
Ion Flux turned to Amazon Elastic MapReduce and Cascading from Concurrent, Inc. to manage jobs on as many as ten- to thirty- node clusters utilizing 200-500 cores, launched in parallel for up to 50 hours.
Ion Flux chose Cascading because it provides the right level of abstraction for production-quality and –scale use of Hadoop. The company found that Cascading is the only way to attack real-world scale problems and deploy to production rapidly. Cascading is a straightforward API that allows programmers to work in Java rather than MapReduce. Cascading also provides finer-grained application definition and allows complex, distributed processes to be assembled quickly. With Cascading, programmers can create new operations, reuse past operations and chain them together easily.
With Cascading, Ion Flux can focus on adding features their customer want instead of optimizing and tuning at the Hadoop level. The company can also launch analytics capabilities quickly to production, since they are already implemented on the same platform. In addition, Ion Flux is able to hire Java programmers rather than MapReduce experts and manage MapReduce jobs easily.
“Given our aggressive development timelines, application complexity and need to constantly modify analytics jobs, Cascading was a perfect fit,” commented Dr. Allen Day, Ion Flux’s founder and CEO. “It is much easier and faster for our developers to work in Java and reuse common operations. Cascading has let us focus on the genomics problem, not technicalities of the underlying computing platform. Cascading empowers us to effectively develop and quickly deliver the services our customers want.”
Ion Flux plans to continue to continue to use Cascading for production-quality and –scale use of Hadoop as they develop their analysis services. The solution will help them create new capabilities with fast time to market.
Ion Flux, Inc. provides services that analyze DNA sequence data for researchers and health professionals in genomic medicine. The information resulting from Ion Flux’s analysis is used to address a wide variety of health problems impacted by human genetics. The company, which is a recent startup, has six team members located in Los Angeles, California, and Beijing, China.
Ion Flux’s business is built around several key questions, including how to more effectively and efficiently characterize genetic variation, how to use this information to improve public health, and how to empower the healthcare industry to provide higher-quality care. To meet these goals, Ion Flux needs to process large amounts of data, run complex MapReduce jobs, and continue to build out its analytics capabilities.