An empirical comparative analysis of clustering algorithms for big data applications
Date of Publication
2017
Document Type
Master's Thesis
Degree Name
Master of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Arnulfo P. Azcarraga
Defense Panel Chair
Ryan Samuel M. Dimaunahan
Defense Panel Member
Rafael A. Cabredo
Abstract/Summary
Big data is a vaguely defined term that describes a dataset as either too large or too complex to analyze and get satisfactory results. Clustering algorithms are a possible solution to this problem of big data, where they can be categorized according to one or more of three clustering objectives. These are defined as either grouping focused algorithms, in which the algorithm aims to classify the dataset into meaningful groups, data summarization algorithms, in which the algorithm aims to summarize the data point into a more concise format for an easier analysis, and finally, data visualization, in which the dataset is visualized in a more understandable format. While there are only three categories one can classify clustering algorithms, there are a large number of clustering algorithms with differing performances for different sizes of datasets. The algorithms empirically evaluated and compared under the research include k-means, SOM, DBSCAN, BFR, and BIRCH, and it was found that the algorithms all have different strengths and weaknesses when classifying scaled up datasets, and one can choose the appropriate algorithm based on these strengths and weaknesses.
Abstract Format
html
Language
English
Format
Electronic
Accession Number
CDTG007182
Shelf Location
Archives, The Learning Commons, 12F Henry Sy Sr. Hall
Physical Description
1 computer disc ; 4 3/4 in.
Keywords
Big data; Algorithms
Upload Full Text
wf_no
Recommended Citation
Delos Santos, D. T. (2017). An empirical comparative analysis of clustering algorithms for big data applications. Retrieved from https://animorepository.dlsu.edu.ph/etd_masteral/5395