FICS: Fast DNA/RNA to amino acid alignment using data level parallelism

Date of Publication

2022

Document Type

Bachelor's Thesis

Degree Name

Bachelor of Science in Computer Science Major in Computer Systems Engineering

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Technology

Thesis Advisor

Roger Luis T. Uy

Defense Panel Chair

Gregory G. Cu

Defense Panel Member

Clement Y. Ong
Fritz S. Flores

Abstract/Summary

Gene expression is one of the key areas of bioinformatics. It is used to determine the functionalities of a gene and discover the effects of external stimuli to an organism. This includes multiple steps: alignment, assembly, quantification, normalization, and modeling. This study will only focus on the first step, which is the sequence alignment phase, where reads are mapped to a reference proteome. Frame alignment algorithm is specifically used to map a DNA/RNA sequence to a reference proteome. A non-model organism is an organism in which there is no proteome model, and it can be mapped in two ways: de novo mapping or close reference proteome mapping. In this study, the research focused on the close reference mapping of the Scylla serrata (mud-crab) by using the Drosophila melanogaster (fruit fly) as the reference proteome model. This would require mapping of millions of reads to the whole reference proteome, thus the need to speed up the process of the alignment phase. Since most of the frame algorithms are implemented sequentially, this study proposes FICS which is a DNA/RNA to protein sequence alignment implementation using data level parallelism. It includes a conversion of a sequential frame alignment algorithm to the SIMD paradigm and implementations to three different technologies namely, Intel SIMD ISA(AVX2), CUDA, and FPGA. Analysis shows that the Intel SIMD ISA implementation had a speedup of 3.5x with an average matrix computation time of 2.5ms. Furthermore, its memory consumption peaked at 231MB and required around 42-52 Watts of power during runtime. On the other hand, the CUDA implementation of the frame alignment algorithm in the SIMT paradigm resulted in suboptimal speeds, using up to 270MiB of memory space and took in around 61-63 Watts during runtime. The FPGA implementation only included the two input data preparations with a speedup of about 13940 times, consuming a maximum memory of 580KB, and having a power consumption of around 2 Watts.

Abstract Format

html

Language

English

Format

Electronic

Physical Description

[301 leaves]

Keywords

Bioinformatics; Nucleotide sequence

Upload Full Text

wf_yes

This document is currently not available here.

Share

COinS