Visualization and analysis of document clusters produced by self-organizing maps
Date of Publication
2013
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Arnulfo Azcarraga
Abstract/Summary
The problem of information overload with the huge number of text documents available makes them increasingly difficult to organize and analyze. To alleviate this problem, text document clustering is used to automatically group related documents together. However, documents usually produce very high-dimensional data, making it resource-intensive to perform data processing on them. Random Projection Method (RPM) is shown to reduce the dimensionality of a large document dataset. The dimensionality reduction scheme is then coupled with Self-Organizing Maps (SOM) to organize the documents in the dataset. K-Means clustering is then performed on the SOM units to produce clusters of documents that were organized within the SOM. Various properties based on the SOM were introduced, as well as a method to measure and visualize them. These allowed for detailed analysis of the clusters and aided in nding outliers of the dataset, overlap between clusters, concentration of documents within clusters, possible subclusters and quality of di erent parts of clusters, among others. Cross-referencing between di erent property visualizations provided internal validation of the observations. For future work, the di erent SOM-based properties and their visualizations can be used for interactive document selection, recommendation systems, and quality measure.
Abstract Format
html
Language
English
Format
Accession Number
TG05337
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
vii, 66 leaves ; 28 cm.
Keywords
Document clustering; Cluster analysis
Recommended Citation
Landrito, M. (2013). Visualization and analysis of document clusters produced by self-organizing maps. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/4372