Wavelet analysis of speaker-dependent speech features

Date of Publication

2001

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Clement Y. Ong

Abstract/Summary

Speaker-dependent speech features are usually estimated using the Short Time Fourier Transform (STFT) method. However, due to the non-stationary nature of speech signals, a fixed-sized window function used by STFT is insufficient to provide accurate time-frequency resolution.

In this study, a Discrete Wavelet Transform (DWT) algorithm was used to analyze speech signals. This transform was designed to apply an Order-3 B-Spline wavelet as its basis function. At each decomposition level of the wavelet transform, the time resolution is halved and the frequency resolution is doubled solving the time-frequency resolution problem. Algorithms for the extraction of speaker-dependent speech features were also developed. To obtain the energy feature of speech, the energy equation was extended to include the computation of energy across all scales. To obtain the fundamental pitch frequency, the pitch period was measured by locating the occurrences of glottal closures in the scales of the wavelet transform. Instead of using all the scales for the pitch period estimation, one algorithm was designed to utilize the first two adjacent scales and another algorithm was designed to use only one scale.

Based on the analysis of these algorithms, it was observed that the energy matrix obtained by the energy vector extraction algorithm characterizes the intensity of the speaker's voice across time. Two algorithms are developed for pitch period estimation and both are based on the detection of glottal closure instants (GCI) in voiced sounds. The first algorithm involves correlating the first two scales of the wavelet transform while the second algorithm involves only one scale of the wavelet transform in its measurement. Overall estimation error rates of 2.4% on the first algorithm and 7.5% on the second algorithm were obtained.

Abstract Format

html

Language

English

Format

Print

Accession Number

TG03745

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 v. (various foliations) ; 28 cm.

Keywords

Wavelets (Mathematics); Speech processing systems; Automatic speech recognition; Voice frequency

This document is currently not available here.

Share

COinS