COMPARISON OF BIG DATA ANALYTICS TOOLS: A BIOINFORMATICS CASE STUDY

Authors

  • MUHAMMAD SHAHZAD Department of Computer Science, PAF – Karachi Institute of Economics & Technology1, Karachi, Pakistan
  • KAMRAN AHSAN Federal Urdu University of Arts Science & Technology, Karachi, Pakistan

Abstract

Due to the exponential growth of genomic sequence in the field of biological science, it becomes the immense challenge for biomedical practitioners and researchers to access and analyze it. We have awash of data today in different application areas particularly in the healthcare systems. Collection of these data sets is not only exceeding in volume of Exabyte and Zettabyte but also in different varieties and velocities. Scientist continuously encounter challenges to make decisions about what to store and what to discard, and how to analyze and extract information within optimal time. In the field of life sciences and biological sciences, next generation sequencing methods have been highly affected by the generationof biological Big Data. These diversities of omics information including genomes, transcriptomes and epigenomes will take us to the Yottabyte (1021) data scale in the coming few years. These radical changes in the generation and acquisition of Big Data begin open challenges for capturing, curation, storage, searching, sharing, transfer, visualization and analysis of information. For big data solution, MPI running of HPC and MapReduce running on Hadoop Cluster have been used. This paper investigates three latest bioinformatics tools used on Hadoop. We adopt comparative methodology in conjunction with functions including Mapping and Dealing with sequence files. In mapping function, we give insight for the alignment of read with respect to reference genome sequence. And in dealing with sequence files, different sequence file formats supports have been discussed. This research will facilitate potential researchers from the field of biological sciences to choose appropriate bioinformatics Hadoop tools in their scientific investigation and findings.

Downloads

Published

2021-05-28