WordFlag: Flagging inappropriate words in recorded speech through word spotting

Date of Publication

2009

Document Type

Bachelor's Thesis

Degree Name

Bachelor of Science in Computer Science

Subject Categories

Computer Sciences

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Jocelyn Cu

Defense Panel Member

Rafael Cabredo

Karlo Campos

Abstract/Summary

Speech analytics is one of the most important methods used by the call center and telecommunications field in analyzing call content to improve customer satisfaction and overall business performance. One of the technologies used in speech analytics is word spotting, which is the identification of specific words in speech.

Current speech analytics systems are usually focused on providing solutions for business analysis and product-related issues based on customer speech or feedback. Most Automatic Speech Recognition (ASR) systems that make use of the word spotting technology have specific methods to disregard the speaker's other words, which are considered insignificant in analyzing calls. However, with the large number of agents needed to be hired by companies and the tight competition of the Philippines with other countries in the worldwide contact or call center industry, there is a need to aid agent training processes and issues, including how other unnecessary and inappropriate words affect the customer-agent interaction.

This research focuses on the design and development of an isolated word spotting system that automatically flags inappropriate words in a speech recording to aid the call center agent training process. WordFlag is trained to flag 65 words from a predefined list and makes use of recordings from different speakers as it is also designed to be speaker-independent. The system incorporates preprocessing through noise reduction, modified isolated word endpoint detection and segmentation, MFCC feature extraction, modified Hidden Markov Models, and word-based recognition. The WordFlag's system test results show an overall recognition average of 41.25%. It was also observed that words which were trained with additional semantic variations show a higher recognition rate at 48.3% than those without variations, which had a lower rate at 31.2%. Improvements on the corpus data and application of phoneme-based recognition may be done for future projects to compare performance of similar systems.

Abstract Format

html

Language

English

Format

Print

Accession Number

TU19880

Shelf Location

Archives, The Learning Commons, 12F, Henry Sy Sr. Hall

Physical Description

1 v. (various foliations) ; 28 cm. + one computer optical disc.

Keywords

Speech processing systems

This document is currently not available here.

Share

COinS