Neural network-based keyword extraction using word frequency, position, usage and format features

Date of Publication

2013

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Arnulfo Azcarraga

Abstract/Summary

Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good key- words. Hence, it is important to be able to automatically extract keywords from documents. In this study, the researcher investigates on the use of the Backpropagation Neural Network algorithm for keyword extraction from documents. The feasibility of using statistical features such as word frequency, positioning, and usage was further validated along with additional word formatting features. Rule extraction was done to be able to examine the relative importance of these statistical features for keyword extraction. Two corpora were used for experimentation: one comprised of IEEE journal papers and the other comprised of Wikipedia articles. With the exclusion of the TF-IDF feature, addition of word format features, and post-calibration of the Backpropagation Neural Networks, the models produced were able to achieve G-Means of 0.75 and 0.77 for the IEEE journal papers and Wikipedia articles respectively. Finally, analysis of results also showed that word formatting features were of much more importance to keyword extraction for Wikipedia articles than for IEEE journal papers, confirming the researcher's initial hypothesis that the varying writing styles would affect the importance of these features.

Abstract Format

html

Language

English

Format

Print

Accession Number

TG05365

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

xi, 123 leaves ; col. ill. ; 28 cm. + 1 computer optical disc.

Keywords

Keyword searching; Back propagation (Artificial intelligence)

This document is currently not available here.

Share

COinS