Autocor: A query based automatic acquisition of corpora of closely-related languages
College
College of Computer Studies
Department/Unit
Information Technology
Document Type
Article
Source Title
PACLIC 21 - The 21st Pacific Asia Conference on Language, Information and Computation, Proceedings
First Page
146
Last Page
154
Publication Date
12-1-2007
Abstract
AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. AutoCor used the query generation method odds ratio, and introduced the concept of common word pruning to differentiate between documents of closely-related Philippine languages and Tagalog. The performance of the system using with and without pruning are compared, and common word pruning was found to improve the precision of the system. © 2007 by Davis Muhajereen D. Dimalen, Rachel Edita O. Roxas.
html
Recommended Citation
Dimalen, D., & Roxas, R. (2007). Autocor: A query based automatic acquisition of corpora of closely-related languages. PACLIC 21 - The 21st Pacific Asia Conference on Language, Information and Computation, Proceedings, 146-154. Retrieved from https://animorepository.dlsu.edu.ph/faculty_research/962