Document classification of Filipino online scam incident text using data mining techniques

Date of Publication


Document Type

Master's Thesis

Degree Name

Master of Science in Information Technology


College of Computer Studies


Information Technology

Thesis Adviser

Marivic S. Tangkeko


The increasing number of online transactions and other internet activities give rise to the proliferation of online scam. The Philippine National Police Anti Cybercrime Group (PNP-ACG) reported an increasing number of complaints from a double digit figure in 2013 to a triple digit figure in 2017. The challenge of addressing this problem in the Philippines is shared by other developing countries in Southeast Asia and other parts of the world. Since 2013 when the PNPACG was established, cybercrime data continue to be accumulated but were not given much attention and significance in research. Previous studies highlight the importance of taking advantage of data analytics. However, the absence of empirical studies on cybercrime analytics in the country connotes the lack of exploitation of data analytics in facilitating cybercrime investigations. This study exploits Weka text mining tool in order to draw insights by classifying a given online scam dataset. Weka is considered as it is a java-based tool from the University of Waikato, New Zealand and it is a free and open source software under the GNU General Public License that supports text mining tasks performed in this study such as pre-processing and classification. Online scam textual data and some narratives from online scam victims were considered as dataset containing 82 documents with a total of 14,098 mainly Filipino words or attributes. J48 Decision Tree, Naïve Bayes, and Sequential Minimal Optimization were used to build classification models. All these three classifiers or algorithms were compared in terms of performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate, followed by Naïve Bayes and then the SMO classifier. Also, the responses during validation reveal that police investigators prefer J48 over the other classifiers as it easy for them to understand and apply in cybercrime investigations. This demonstrates how text mining predictive analytics can assist the PNP-ACG in analyzing and identifying online scam criminal behaviors as it also highlights the importance of employing data mining tools in the legal and criminal investigation domains in the Philippines. Further work can be carried out in the future using different and a more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool, and other data mining tasks such as crime prevention and prediction, clustering, finding leads, trends and patterns of criminal activities, among others.

Abstract Format






Accession Number


Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer disc ; 4 3/4 in.


Computer fraud--Philippines; Computer crimes--Philippines; Fraud--Philippines

This document is currently not available here.