Content and link based web spam detection
Date of Publication
2012
Document Type
Bachelor's Thesis
Degree Name
Bachelor of Science in Computer Science
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Arlyn Verina L. Ong
Defense Panel Chair
Clement Y. Ong
Abstract/Summary
Web spams are web pages that use various maneuvering techniques to artificially raise their page rankings in search engine results. These pages illegitimately manipulate the algorithms used by search engines allowing them to appear as though their web page contains trustworthy content and are most relevant to what the research engine user needs. Consequently, this would degrade the quality of search engine results and search engine users will inevitably be misled. Human experts can do a good job on identifying spam pages and pages whose content is of doubtful quality. However, it is impractical to solely rely on human effort for classifying millions of web pages since it is too costly and time consuming. Most of the recently developed approaches that address this problem use machine learning for detecting web spam that is, using a set of expert-classified pages – either reputable or spam – as inputs to an algorithm/s, and from there learns and classifies other unclassified pages in the web. While researchers on this field are mainly concerned with identifying new feature sets to retrieving these feature set are disregarded. This study has identified C4.5 classifies with a feature set, containing more content based features than link based features of a page, as a most efficient web spam detection design in terms of minimizing the required resource utilization, specifically the time complexity, and maintaining the quality of web spam detection.
Abstract Format
html
Language
English
Format
Accession Number
TU16771
Shelf Location
Archives, The Learning Commons, 12F, Henry Sy Sr. Hall
Physical Description
1 v. (various foliations) ; 28 cm.
Recommended Citation
Canete, A., Gervacio, P., Kim, D., & Quinto, R. (2012). Content and link based web spam detection. Retrieved from https://animorepository.dlsu.edu.ph/etd_bachelors/14785