Autocor: A query based automatic acquisition of corpora of closely-related languages

College

College of Computer Studies

Department/Unit

Information Technology

Document Type

Article

Source Title

PACLIC 21 - The 21st Pacific Asia Conference on Language, Information and Computation, Proceedings

First Page

146

Last Page

154

Publication Date

12-1-2007

Abstract

AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. AutoCor used the query generation method odds ratio, and introduced the concept of common word pruning to differentiate between documents of closely-related Philippine languages and Tagalog. The performance of the system using with and without pruning are compared, and common word pruning was found to improve the precision of the system. © 2007 by Davis Muhajereen D. Dimalen, Rachel Edita O. Roxas.

html

This document is currently not available here.

Share

COinS