Wavelet analysis of speaker-dependent speech features
Date of Publication
2001
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Clement Y. Ong
Abstract/Summary
Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution.
In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale.
Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained.
Abstract Format
html
Language
English
Format
Accession Number
TG03745
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
1 v. (various foliations) ; 28 cm.
Keywords
Wavelets (Mathematics); Speech processing systems; Automatic speech recognition; Voice frequency
Recommended Citation
Wong, J. O. (2001). Wavelet analysis of speaker-dependent speech features. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/3206