Pattern matching refinements to dictionary-based code-switching point detection

College

College of Computer Studies

Department/Unit

Information Technology

Document Type

Conference Proceeding

Source Title

Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012

First Page

229

Last Page

236

Publication Date

12-1-2012

Abstract

This study presents the development and evaluation of pattern matching refinements (PMRs) to automatic code switching point (CSP) detection. With all PMRs, evaluation showed an accuracy of 94.51%. This is an improvement to reported accuracy rates of dictionary-based approaches, which are in the range of 75.22%-76.26% (Yeong and Tan, 2010). In our experiments, a 100-sentence Tagalog-English corpus was used as test bed. Analyses showed that the dictionary-based approach using part-of-speech checking yielded an accuracy of 79.76% only, and two notable linguistic phenomena, (1) intra-word code-switching and (2) common words, were shown to have caused the low accuracy. The devised PMRs, namely: (1) common word exclusion, (2) common word identification, and (3) common n-gram pruning address this and showed improved accuracy. The work can be extended using audio files and machine learning with larger language resources. © 2012 The PACLIC.

html

Disciplines

Computer Sciences

Keywords

Computational linguistics; Code switching (Linguistics)

Upload File

wf_no

This document is currently not available here.

Share

COinS