Date of Publication
12-1-2022
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
Subject Categories
Computer Sciences
College
College of Computer Studies
Department/Unit
Software Technology
Thesis Advisor
Anish Man Singh Shrestha
Defense Panel Chair
Roger Luis Uy
Defense Panel Member
Jennifer Ureta
Anish Man Singh Shrestha
Abstract/Summary
RNA-seq is an experiment technique that utilizes modern, high throughput sequencing technology to sequence a population of mRNA. A common use of RNAseq is for Differential Gene Expression Analysis (DGEA), which is the process of identifying genes with significant changes in their expression levels across conditions. Typical DGEA pipelines, which require an annotated reference genome or transcriptome, cannot be applied to most organisms, since only a few organisms have been extensively studied and have a high quality annotated reference transcriptome available. A more complex pipeline is often used for DGEA in the case of organisms without an annotated reference transcriptome. This complex pipeline involves constructing a de novo transcriptome assembly, which is the process of reconstructing transcript sequences from the RNA-seq reads. However, constructing a de novo assembly is computationally expensive. Recently, we proposed a novel alternative, in which we directly align the RNA-seq reads to a protein database of a close relative. The alternative pipeline provides improvements in speed and memory usage, while improving the precision and recall in identifying genes that are differentially expressing. However, this alternative pipeline utilizes full sequence alignments which take time and generate information unnecessary for DGEA. This study replaces full sequence alignments with quasi-mapping, which determines the mapping by rapid look-ups of sub-strings of a query sequence. We report a further speed-up by replacing full sequence alignment with quasi-mapping, making our pipeline > 1000× faster than assembly-based approach, and still more accurate. We also compared quasi-mapping to other mapping techniques, and show that it is faster but at the cost of sensitivity.
Abstract Format
html
Language
English
Format
Electronic
Physical Description
44 leaves
Keywords
Nucleotide sequence; Gene expression
Recommended Citation
Santiago, K. L. (2022). DNA-protein quasi-mapping for differential gene expression analysis in non-model organisms. Retrieved from https://animorepository.dlsu.edu.ph/etdm_softtech/7
Upload Full Text
wf_yes
Embargo Period
12-12-2022