Prediction of disease-disease associations based on relation extraction from biomedical journals using support vector machines

Date of Publication


Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science


College of Computer Studies


Computer Science

Thesis Adviser

Natalie Rose Lim-Cheng

Defense Panel Chair

Joel P. Ilao

Defense Panel Member

Angelyn R. Lao
Charibeth K. Cheng
Merlin Teodosia C. Suarez


Predicting novel associations between biomedical entities, such as genes, drugs and diseases, can suggest new topics for experiments and new insights in drug design. Due to the massive amounts of relevant data available, a computational approach is well-suited for this task. Initial data can be taken either from curated databases of biomedical terms and the relations between them, or directly from the text of research articles. Existing studies on predicting associations between diseases based on published articles generally use a co-occurrence-based approach, such as extracting the names of diseases and other entities from articles. The weighting scheme for such an approach is based on how many times entity pairs occur together in different documents. This paper describes a semantic analysis- based approach. It extracts biological events and relations between biochemical entities and diseases from texts, and only identifes general associations between entities if instances of relation between them were extracted. The system had an overall accuracy of 84.35% when tested with ve-fold cross-validation on 86 articles from PubMed Central Open Access. The effectiveness of several instance features on improving relation extraction was tested, and a 1-token-window bag of words around tokens indicating biomedical entities was found to improve accuracy, while entity distance, token distance, and syntactic dependency subtree had little effect on accuracy.

Abstract Format






Accession Number


Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

1 computer disc ; 4 3/4 in.


Semantic computing; Vector processing (Computer science)

This document is currently not available here.