Visualization and analysis of document clusters produced by self-organizing maps
Date of Publication
Master of Science in Computer Science
College of Computer Studies
The problem of information overload with the huge number of text documents available makes them increasingly difficult to organize and analyze. To alleviate this problem, text document clustering is used to automatically group related documents together. However, documents usually produce very high-dimensional data, making it resource-intensive to perform data processing on them. Random Projection Method (RPM) is shown to reduce the dimensionality of a large document dataset. The dimensionality reduction scheme is then coupled with Self-Organizing Maps (SOM) to organize the documents in the dataset. K-Means clustering is then performed on the SOM units to produce clusters of documents that were organized within the SOM. Various properties based on the SOM were introduced, as well as a method to measure and visualize them. These allowed for detailed analysis of the clusters and aided in nding outliers of the dataset, overlap between clusters, concentration of documents within clusters, possible subclusters and quality of di erent parts of clusters, among others. Cross-referencing between di erent property visualizations provided internal validation of the observations. For future work, the di erent SOM-based properties and their visualizations can be used for interactive document selection, recommendation systems, and quality measure.
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
vii, 66 leaves ; 28 cm.
Document clustering; Cluster analysis
Landrito, M. (2013). Visualization and analysis of document clusters produced by self-organizing maps. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/4372