Date of Publication
2-1-2011
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
Subject Categories
Computer Sciences
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Arnulfo Azcarraga
Defense Panel Chair
Nelson Marcos
Defense Panel Member
Arnulfo Azcarraga
Charibeth Cheng
Abstract/Summary
Keywords are increasingly useful as users are faced with the challenge of keeping up with voluminous information that they need to process every day. The most straightforward way for extracting keywords is to compute for the term frequencies for each document. But when dealing with corpora containing hundreds of thousands of unique terms, the huge amount of space needed and the enormous amount of computing time required to eventually extract the most relevant terms as keywords would severely limit the practical implementation of current keyword extraction techniques. As such, the frequency counts of extracted terms need to be subjected to a data compression scheme. In this research, the random projection method is used to compress the extracted data and the method allows for various clustering and keyword extraction algorithms to be done directly on the compressed data. Several experiments are conducted to assess the effect of the random projection method on the quality and time-space efficiency of the k-means clustering and term extraction.
Abstract Format
html
Language
English
Format
Electronic
Electronic File Format
MS WORD
Accession Number
CDTG004899
Shelf Location
Archives, The Learning Commons, 12F, Henry Sy Sr. Hall
Physical Description
1 computer optical disc, 4 3/4 in.
Keywords
Text processing (Computer science); Dimension reduction (Statistics); Document clustering
Upload Full Text
wf_yes
Recommended Citation
Dy, J. S. (2011). Keyword extraction for very high dimensional datasets using random projection as key input representation scheme. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/6649
Embargo Period
4-18-2022