Using self-organizing maps and regression to solve the acoustic-to-articulatory Inversion as input to a visual articulatory feedback system

Date of Publication


Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science


College of Computer Studies


Computer Science

Thesis Adviser

Solomon See

Defense Panel Member

Joel Ilao
Jocelynn Cu


Visual articulatory feedback (VAF) are systems that provide visual representations of the user's articulations as feedback and have been shown to help in second language learning and speech therapy for people with hearing impairment. However, one of their current limitations is they do not give feedback on how to correct articulations. This can be overcome by showing which articulators are being used vis- a-vis the correct articulators through acoustic-to-articulatory inversion. Researches on solving this problem have been continuing for the last forty years because of its one-to-many nature and its non-linearity. However, most of these are not easily applicable to VAFs and are focused on sounds made by non-hearing impaired people. By using a combination of Self-Organizing Maps (SOMs) and regression models, this research aims to solve the acoustic-to-articulatory inversion problem as input to the creation of a VAF for the hearing impaired. The models are created using acoustic and articulatory data from the MOCHA-TIMIT database. Acoustic data are represented using Mel Frequency Cepstral Coe - cients (MFCCs), while articulatory data are represented using Cartesian coordinates of the di erent articulators. Video and audio data from people with hearing impairment are also collected for testing. Using the models created, the values of the articulatory parameters from the audio data are derived. Aside from that, speci c articulators that should be adjusted to produce the target sound will also be provided. Visualization was done by plotting the Cartesian coordinates of the articulatory features and overlaying a side view of the vocal tract to it. Results showed that the inversion methodology explained here does not improve significant improvement to existing methods. People with hearing impairment have di erent strengths and weaknesses in regards to the sounds they pronounce (the rst subject is good at 'o' vowels while the second is good at 'e' vowels). Also, there are only slight di erences to the predicted articulatory positions of the hearing impaired compared to the target. However, the sounds produce by the hearing impaired have a nasal quality, which is not captured by the model due to lack of data.

Abstract Format






Accession Number


Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer optical disc ; 4 3/4 in.

This document is currently not available here.