Use of word and character N-grams for low-resourced local languages

Added Title

International Conference on Asian Language Processing (2018)
IALP 2018

College

College of Computer Studies

Department/Unit

Computer Science

Document Type

Conference Proceeding

Source Title

Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018

First Page

250

Last Page

254

Publication Date

1-28-2019

Abstract

Language identification is a text classification task for identifying the language of a given text. Several works use this as a preprocessing technique prior to sentiment analysis, mood analysis, and named entity recognition among others. Thus, building an accurate language identification engine is important given that the Philippines is home to more than 170 languages, and is scarce of language documents and resources. We compare machine learning algorithms such as Naive Bayes, Linear Support Vector Machines (SVM), and Random Forest for classification of Philippine languages. Results show that the Linear SVM model had the best performance with 0.97 Fl-score. © 2018 IEEE.

html

Digitial Object Identifier (DOI)

10.1109/IALP.2018.8629235

Disciplines

Computer Sciences

Keywords

Natural language processing (Computer science); Machine learning

Upload File

wf_no

This document is currently not available here.

Share

COinS