Evaluating keyword selection methods for WEBSOM text archives
College
College of Computer Studies
Department/Unit
Computer Technology
Document Type
Article
Source Title
IEEE Transactions on Knowledge and Data Engineering
Volume
16
Issue
3
First Page
380
Last Page
383
Publication Date
3-1-2004
Abstract
The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is abie to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.
html
Digitial Object Identifier (DOI)
10.1109/TKDE.2003.1262193
Recommended Citation
Azcarraga, A. P., Yap, T. N., Tan, J. O., & Chua, T. (2004). Evaluating keyword selection methods for WEBSOM text archives. IEEE Transactions on Knowledge and Data Engineering, 16 (3), 380-383. https://doi.org/10.1109/TKDE.2003.1262193
Disciplines
Computer Sciences
Keywords
Text processing (Computer science); Automatic indexing
Upload File
wf_yes