Incorporation of WordNet features to n-gram features in a language modeler
College
College of Computer Studies
Department/Unit
Software Technology
Document Type
Conference Proceeding
Source Title
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22
First Page
179
Last Page
188
Publication Date
12-1-2008
Abstract
n-gram language modeling is a popular technique used to improve performance of various NLP applications. However, it still faces the "curse of dimensionality" issue wherein word sequences on which the model will be tested are likely to be different from those seen during training (Bengio et al., 2003). An approach that incorporates WordNet to a trigram language modeler has been developed to address this issue. WordNet was used to generate proxy trigrams that may be used to reinforce the fluency of the given trigrams. Evaluation results reported a significant decrease in model perplexity showing that the new method, evaluated using the English language in the business news domain, is capable of addressing the issue. The modeler was also used as a tool to rank parallel translations produced by multiple Machine Translation systems. Results showed a 6-7% improvement over the base approach (Callison-Burch and Flournoy, 2001) in correctly ranking parallel translations. © 2008 by Kathleen L. Go and Solomon L. See.
html
Recommended Citation
Go, K. L., & See, S. L. (2008). Incorporation of WordNet features to n-gram features in a language modeler. Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, PACLIC 22, 179-188. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/513
Disciplines
Computer Sciences
Keywords
Natural language processing (Computer science); Computational linguistics; Translating and interpreting
Upload File
wf_no