Motif search using Gibbs sampling: Notes on effectiveness in a distributed environment
College
College of Computer Studies
Department/Unit
Software Technology
Document Type
Conference Proceeding
Source Title
2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, HNICEM 2019
Publication Date
11-1-2019
Abstract
© 2019 IEEE. Motif search is a common problem in bioinformatics where unique DNA sequences (motifs) of a specific length inscribed in long strands signify binding sites for transcription factors. In this paper, we present some important notes on the implementation of motif search using Gibbs sampling algorithm in a distributed computing environment by analyzing visualization on speed and motif scoring of various distributed implementations. For the DNA sequences data, we used an open-source mouse genome fragments with lengths 250, 500, and 1000. We built upon our previous studies (Perera and Ragel, 2013; Chen and Jiang, 2006) by integrating a distributed environment of the motif search workloads (jobs) across 16 CPU cores contained on 2 computer nodes instead of the traditional way of parallelizing on a single computing device with multicore CPUs. Results show that using saving the DNA sequences in list and adding as a parameter argument obtained the fastest execution time compared to implementations by sending file dependencies and in-memory processing.
html
Digitial Object Identifier (DOI)
10.1109/HNICEM48295.2019.9072697
Recommended Citation
Imperial, J. R., Gail Ya-On, C., & Cu, G. G. (2019). Motif search using Gibbs sampling: Notes on effectiveness in a distributed environment. 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, HNICEM 2019 https://doi.org/10.1109/HNICEM48295.2019.9072697
Upload File
wf_yes