Development, implementation and testing of language identification system for seven Philippine languages
College
College of Computer Studies
Department/Unit
Computer Technology
Document Type
Article
Source Title
Philippine Journal of Science
Volume
144
Issue
1
First Page
81
Last Page
89
Publication Date
6-1-2015
Abstract
Three Language Identification (LID)approaches, namely, acoustic, phonotactic, and prosodic approaches are explored for Philippine Languages. Gaussian Mixture Models (GMM) is used for acoustic and prosodic approaches. The acoustic features used were Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Shifted Delta Cepstra (SDC) and Linear Prediction Cepstral Coefficients (LPCC). Pitch, rhythm, and energy are used as prosodic features. A Phone Recognition followed by Language Modelling (PRLM) and Parallel Phone Recognition followed by Language Modelling (PPRLM) are used for the phonotactic approach. After establishing that acoustic approach using a 32nd order PLP GMM-EM achieved the best performanceamong the combinations of approach and feature, three LID systems were built: 7-language LID, pair-wise LID and hierarchical LID; with average accuracy of 48.07%, 72.64% and 53.99%, respectively. Among the pair-wise LID systems the highest accuracy is 92.23% for Tagalog and Hiligaynon and the lowest accuracy is 52.21% for Bicolano and Tausug. In the hierarchical LID system, the accuracy for Tagalog, Cebuano, Bicolano, and Hiligaynon reached 80.56%, 80.26%, 78.26%, and 60.87% respectively. The LID systems that were designed, implemented and tested, are best suited for language verification or for language identification systems with small number of target languages that are closely related such as Philippine languages. © 2015, Science and Technology Information Institute. All rights reserved.
html
Recommended Citation
Laguna, A. B., & Guevara, R. L. (2015). Development, implementation and testing of language identification system for seven Philippine languages. Philippine Journal of Science, 144 (1), 81-89. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/3346
Disciplines
Computer Sciences
Keywords
Computational linguistics; Automatic speech recognition
Upload File
wf_no