Spectral subtraction speech enhancement integrated to automatic speech recognition system implemented in FPGA

Date of Publication

2012

Document Type

Master's Thesis

Degree Name

Master of Science in Electronics and Communications Engineering

College

Gokongwei College of Engineering

Department/Unit

Electronics and Communications Engineering

Thesis Adviser

Edwin Sybingco
Roderick Y. Yap

Defense Panel Chair

Alexander Abad

Defense Panel Member

Reggie Gustilo
Cesar A. Llorente

Abstract/Summary

Meeting a good accuracy in speech recognition systems had been one of the challenges in automatic speech recognition (ASR) designs. In this study, a spectral subtraction speech enhancement is added to the acoustic front end of an ASR system. The two word vocabulary ASR system with speech enhancement was first modelled in MATLAB. The system starts with the framing, windowing and FFT of the input speech signal. Noise is then estimated from the output of the FFT by averaging the first 8 output frames of the FFT. The estimated noise spectrum magnitude is subtracted from the original speech signal. And to totally enhance the speech, noise flooring is included in the design. A factor beta (b) is multiplied to the noise estimate and is substituted to the original speech during silence period. The hardware modelling was done using VHDL which practically followed the MATLAB design. After the VHDL design was realized it was then implemented on the FPGA. Both the MATLAB and FPGA models are evaluated in terms of the correlation of the original clean speech and enhanced speech, and the recognition accuracy. After several testing it was concluded that the optimum beta (b) to use for the spectral subtraction is 0.01. In MATLAB the average correlation obtained for SNR -3.4 to 34.8 dB is 83.7% while 80.28% was recorded for FPGA. The average recognition rate on the other hand for the MATLAB and FPGA is 45.58% and 48.5% respectively. Also the tolerable background noise that the system could handle is within 0 to 68.6 dB background noise with recognition accuracy of 75% and above.

Abstract Format

html

Language

English

Format

Electronic

Accession Number

CDTG005270

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer optical disc ; 4 3/4 in.

Upload Full Text

wf_no

This document is currently not available here.

Share

COinS