Date of Publication

2023

Document Type

Dissertation/Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Software Technology

Thesis Advisor

Ronald Pascual

Defense Panel Chair

Judith Azcarraga

Defense Panel Member

Ann Franchesca Laguna
Ronald Pascual

Abstract (English)

Although there have been previous studies on Filipino ASR, it is primarily focused on the Hidden Markov Model (HMM) with the Gaussian Mixture Model (GMM) approach. Studies on Bisaya ASR are much more limited in terms of resources such as speech corpus and previous works. There is a lack of neural network or end-to-end system studies because of this since neural networks require massive amounts of data to train. An alternative to this would be the hybrid model which makes use of both neural networks and HMM. This neural network architecture would still need data but not as much as an end-to-end ASR system. To address these opportunities, this study makes use of De La Salle University’s healthcare chatbot project speech corpus for the Filipino and Bisaya languages. Furthermore, this study collected, preprocessed, as well as transcribed additional Filipino speech data. With these data, the study also presented an HMM-GMM ASR system similar to previous studies as a baseline. This study also experimented with phoneme sets, n-grams, language model weights, HMM states, and model enhancement techniques. The study found that the best models for both Filipino and Bisaya used SAT with a 3.96% WER AND 5.41% WER respectively. The study also developed a deep neural network (DNN) HMM baseline model and time delay neural network (TDNN) HMM models with symmetric, asymmetric, and subsampled time strides. For Filipino, the best model is the asymmetric TDNN-HMM model with a 3.48% WER. For Bisaya, the best model is the baseline DNN-HMM model with a 5.50% WER. Furthermore, the study also explored numerous experiments which are: 1) the effects of additional data with respect to performance, 2) the performance of the models on actual conversational children’s speech, and 3) the performance of using cross-language acoustic models.

Abstract Format

html

Language

English

Upload Full Text

wf_yes

Embargo Period

8-10-2023

Share

COinS