Date of Publication
2023
Document Type
Dissertation/Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Software Technology
Thesis Advisor
Ronald Pascual
Defense Panel Chair
Judith Azcarraga
Defense Panel Member
Ann Franchesca Laguna
Ronald Pascual
Abstract (English)
Although there have been previous studies on Filipino ASR, it is primarily focused on the Hidden Markov Model (HMM) with the Gaussian Mixture Model (GMM) approach. Studies on Bisaya ASR are much more limited in terms of resources such as speech corpus and previous works. There is a lack of neural network or end-to-end system studies because of this since neural networks require massive amounts of data to train. An alternative to this would be the hybrid model which makes use of both neural networks and HMM. This neural network architecture would still need data but not as much as an end-to-end ASR system. To address these opportunities, this study makes use of De La Salle University’s healthcare chatbot project speech corpus for the Filipino and Bisaya languages. Furthermore, this study collected, preprocessed, as well as transcribed additional Filipino speech data. With these data, the study also presented an HMM-GMM ASR system similar to previous studies as a baseline. This study also experimented with phoneme sets, n-grams, language model weights, HMM states, and model enhancement techniques. The study found that the best models for both Filipino and Bisaya used SAT with a 3.96% WER AND 5.41% WER respectively. The study also developed a deep neural network (DNN) HMM baseline model and time delay neural network (TDNN) HMM models with symmetric, asymmetric, and subsampled time strides. For Filipino, the best model is the asymmetric TDNN-HMM model with a 3.48% WER. For Bisaya, the best model is the baseline DNN-HMM model with a 5.50% WER. Furthermore, the study also explored numerous experiments which are: 1) the effects of additional data with respect to performance, 2) the performance of the models on actual conversational children’s speech, and 3) the performance of using cross-language acoustic models.
Abstract Format
html
Language
English
Recommended Citation
Ing, J. (2023). Filipino and Bisaya ASR System Using TDNN-HMM Towards Application in a Healthcare Chatbot. Retrieved from https://animorepository.dlsu.edu.ph/etdm_softtech/9
Upload Full Text
wf_yes
2023_Ing_PageswithSignature.pdf (892 kB)
2023_Ing_Chapter1.pdf (69 kB)
2023_Ing_Chapter2.pdf (115 kB)
2023_Ing_Chapter3.pdf (235 kB)
2023_Ing_Chapter4.pdf (102 kB)
2023_Ing_Chapter5.pdf (360 kB)
2023_Ing_Chapter6.pdf (71 kB)
2023_Ing_References.pdf (84 kB)
Embargo Period
8-10-2023