Neural network-based keyword extraction using word frequency, position, usage and format features
Date of Publication
2013
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Arnulfo Azcarraga
Abstract/Summary
Keywords have become integral to many Knowledge Management Systems, Information Retrieval Systems, and Digital Libraries. These have also become significant in commerce, specifically in providing contextual advertisements to online content. Not all text information, however, are annotated with good key- words. Hence, it is important to be able to automatically extract keywords from documents. In this study, the researcher investigates on the use of the Backpropagation Neural Network algorithm for keyword extraction from documents. The feasibility of using statistical features such as word frequency, positioning, and usage was further validated along with additional word formatting features. Rule extraction was done to be able to examine the relative importance of these statistical features for keyword extraction. Two corpora were used for experimentation: one comprised of IEEE journal papers and the other comprised of Wikipedia articles. With the exclusion of the TF-IDF feature, addition of word format features, and post-calibration of the Backpropagation Neural Networks, the models produced were able to achieve G-Means of 0.75 and 0.77 for the IEEE journal papers and Wikipedia articles respectively. Finally, analysis of results also showed that word formatting features were of much more importance to keyword extraction for Wikipedia articles than for IEEE journal papers, confirming the researcher's initial hypothesis that the varying writing styles would affect the importance of these features.
Abstract Format
html
Language
English
Format
Accession Number
TG05365
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
xi, 123 leaves ; col. ill. ; 28 cm. + 1 computer optical disc.
Keywords
Keyword searching; Back propagation (Artificial intelligence)
Recommended Citation
Tensuan, J. (2013). Neural network-based keyword extraction using word frequency, position, usage and format features. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/4386