Information extraction for elegislation
Date of Publication
2010
Document Type
Bachelor's Thesis
Degree Name
Bachelor of Science in Computer Science
Subject Categories
Computer Sciences
College
College of Computer Studies
Department/Unit
Computer Science
Thesis Adviser
Allan Borra
Defense Panel Member
Charibeth Cheng
Rachel Roxas
Abstract/Summary
Information extraction (IE) is the process of transforming unstructured information of documents into a structured database of structured information. This technology allowed more narrowed-down search results of documents stored in Document Management System (DMS). An IE system was developed to augment a Blue Ribbon Committee (BRC) DMS for the eParticipation Project. IE architectures were studied and related tools were identified to develop the IE system specifically for the BRC. The IE System is composed of 7 minor modules namely Sentence Splitter, Tokenizer, Cross Reference, Part of Speech Tagger, Unknown Word, Named Entity Recognition and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Preparser, 3 major modules which are Semantic Tagger, CoReference Resolution and Template Filler, and 2 external modules which are Search and Evaluation modules. With the help and constant communication with the Blue Ribbon Committee, the research was able to gather documents that helped in creating the system. Also, the output is already created and extracted based on the preference of the client and that the output system is already meeting the standards requested by the Blue Ribbon Committee. Overall, the system showed favorable results in the actual testing phase which had an output of 95.42%, but when the initial format of the documents were followed, the result of the system would be 100% accurate. Upon presenting the system to the main stakeholders, they remarked that what they had seen was already beyond their expectations and they were very pleased about the outcome. There are still parts of the system which could be improved on, such as train the values of the POS Tagger and the Named Entity Recognition from the documents being fed, update the library used to open word document files, add documents and templates to the system's process, add image recognition to the system, update web crawler for more sources and improve the search ranking algorithm.
Abstract Format
html
Language
English
Format
Accession Number
TU19863
Shelf Location
Archives, The Learning Commons, 12F, Henry Sy Sr. Hall
Physical Description
1 v. (various foliations) : illustrations (some colored) ; 28 cm.
Keywords
Text processing (Computer science); Natural language processing (Computer science); Database management
Recommended Citation
Lim, B., Miranda, A., Trogo, J., & Yap, F. (2010). Information extraction for elegislation. Retrieved from https://animorepository.dlsu.edu.ph/etd_bachelors/11062