Pattern matching refinements to dictionary-based code-switching point detection
College
College of Computer Studies
Department/Unit
Information Technology
Document Type
Conference Proceeding
Source Title
Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012
First Page
229
Last Page
236
Publication Date
12-1-2012
Abstract
This study presents the development and evaluation of pattern matching refinements (PMRs) to automatic code switching point (CSP) detection. With all PMRs, evaluation showed an accuracy of 94.51%. This is an improvement to reported accuracy rates of dictionary-based approaches, which are in the range of 75.22%-76.26% (Yeong and Tan, 2010). In our experiments, a 100-sentence Tagalog-English corpus was used as test bed. Analyses showed that the dictionary-based approach using part-of-speech checking yielded an accuracy of 79.76% only, and two notable linguistic phenomena, (1) intra-word code-switching and (2) common words, were shown to have caused the low accuracy. The devised PMRs, namely: (1) common word exclusion, (2) common word identification, and (3) common n-gram pruning address this and showed improved accuracy. The work can be extended using audio files and machine learning with larger language resources. © 2012 The PACLIC.
html
Recommended Citation
Oco, N., & Roxas, R. (2012). Pattern matching refinements to dictionary-based code-switching point detection. Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, PACLIC 2012, 229-236. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/588
Disciplines
Computer Sciences
Keywords
Computational linguistics; Code switching (Linguistics)
Upload File
wf_no