Date of Publication

4-2023

Document Type

Dissertation

Degree Name

Doctor in Information Technology

Subject Categories

Computer Sciences | Educational Technology | Software Engineering

College

College of Computer Studies

Department/Unit

Software Technology

Thesis Advisor

Ronald M. Pascual

Defense Panel Chair

Arnulfo Azcarraga

Defense Panel Member

Joel Ilao
Charibeth Cheng
Federico Ang
Ronald Pascual

Abstract/Summary

With the end view of helping the Philippine education system in its literacy initiatives, this study aims to develop methods for automatic assessment of oral reading fluency from children's read speech in the Filipino language. Thus, this study seeks to design methods of automatically extracting and analyzing prosodic features of children's read speech in Filipino. To achieve this, the four-fold set of research activities was conducted to describe an automated oral reading fluency assessment system. It consisted of 1) building a children's Filipino speech corpus, 2) designing methods of extracting and analyzing prosodic features, 3) developing methods of automatically assessing oral reading fluency, and 4) evaluating the performance of developed methods. The dataset consisted of 192 audio files totaling 11 hours, 48 minutes, and 13 seconds. The audio files were recordings of children ages 6 to 11 years reading grade-appropriate passages in the Filipino language. Human raters manually annotated the files as fluent or nonfluent; and as independent, instructional, and frustration levels. Audio and prosodic features were extracted and used as predictor variables in the machine learning training and testing. The machine learning classification methods produced results indicating that the SVM had validation accuracies of 81.18% and 87.71% for the three-level fluency scheme and two-level fluency scheme, respectively. The predictor variables used for these classifications were different. For the three-level scheme, the variables were DSP- and ASR-computed speech rate and Levenshtein distance, while for the two-level scheme, they were total duration, Levenshtein distance, out-of-vocabulary words, DSP-computed articulation rate, and ASR-computed speech rate. The Mel-frequency and gammatone cepstrum coefficients, spectral audio, and wavelet features did not provide significant prediction performance results. On the other hand, the LSTM deep learning method resulted in validation accuracies of 55.08% and 79.61% for the three- and two-level fluency schemes, respectively. To further improve the prediction accuracy, it is recommended that more predictor features be identified, such as other types of reading miscues and pauses features. Also, more reading data may be gathered to balance the distribution of fluency classes in the dataset and to make deep-learning methods discover robust predictor features and improve performance.

This study is relevant in addressing the issue of poor reading performance among Filipino children. The study has created a children's read speech corpus in Filipino language, which will eventually be a part of a larger dataset aimed at addressing the limited availability of children's Filipino speech corpus. The study has identified relevant and non-relevant predictor features that can be used to automatically classify oral reading fluency. These features were used as inputs to develop fluency classification methods. The speech corpus, fluency predictor features, and classification techniques based on DSP- and ASR-based feature extraction developed in this study will form as a framework for building an automated oral reading fluency assessment system.

Abstract Format

html

Language

English

Format

Electronic

Keywords

Speech processing systems; Automatic speech recognition; Oral reading—Evaluation; Reading—Ability testing—Philippines; Filipino language—Versification

Upload Full Text

wf_yes

Embargo Period

4-24-2025

Available for download on Thursday, April 24, 2025

Share

COinS