Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA
Date of Publication
2012
Document Type
Master's Thesis
Degree Name
Master of Science in Electronics and Communications Engineering
College
Gokongwei College of Engineering
Department/Unit
Electronics and Communications Engineering
Thesis Adviser
Edwin Sybingco
Roderick Y. Yap
Defense Panel Chair
Alexander Abad
Defense Panel Member
Reggie Gustilo
Cesar A. Llorente
Abstract/Summary
Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above.
Abstract Format
html
Language
English
Format
Electronic
Accession Number
CDTG005270
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
1 computer optical disc ; 4 3/4 in.
Upload Full Text
wf_no
Recommended Citation
Orillo, J. F. (2012). Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/4302