FICS: Fast DNA/RNA to amino acid alignment using data level parallelism
Date of Publication
2022
Document Type
Bachelor's Thesis
Degree Name
Bachelor of Science in Computer Science Major in Computer Systems Engineering
Subject Categories
Computer Sciences
College
College of Computer Studies
Department/Unit
Computer Technology
Thesis Advisor
Roger Luis T. Uy
Defense Panel Chair
Gregory G. Cu
Defense Panel Member
Clement Y. Ong
Fritz S. Flores
Abstract/Summary
Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts.
Abstract Format
html
Language
English
Format
Electronic
Physical Description
[301 leaves]
Keywords
Bioinformatics; Nucleotide sequence
Recommended Citation
Lim, S. W., Lim, S. C., Ting, C. P., & Wong, A. C. (2022). FICS: Fast DNA/RNA to amino acid alignment using data level parallelism. Retrieved from https://animorepository.dlsu.edu.ph/etdb_comtech/4
Upload Full Text
wf_yes