Automatic extraction of conceptual relations from children's stories

Date of Publication

2013

Document Type

Master's Thesis

Degree Name

Master of Science in Computer Science

College

College of Computer Studies

Department/Unit

Computer Science

Thesis Adviser

Ethel Ong

Defense Panel Chair

Charibeth Cheng

Defense Panel Member

Natalie Rose Lim Cheng
Ethel Ong

Abstract/Summary

People use storytelling as a natural and familiar means of conveying information and experience to each other. During this interchange, people understand each other because we rely on a large body of shared common sense knowledge. But computers do not share this knowledge, causing a barrier in human-computer interaction and in applications requiring computers to generate coherent text. To support this task, computers must be provided with a usable knowledge about the basic relationships between concepts that we need everyday in our world.

This research made use of GATE, an existing tool, and custom extraction rules to automatically extract concepts and their relations from existing children's stories, and store these in a knowledge base that story generation systems like Picture Books and other NLP applications can utilize to do their tasks. Sixteen (16) relation types were extracted specifying descriptions of story elements, character actions, temporal succession and causal chain of events, spatial and functional information of story objects, and world state information in a story. Based on the results of the evaluations, the extractor has been found to be inaccurate in identifying relations in a story. It has an overall accuracy of 36% based on precision, recall and F-measure. The incomplete and generalized templates, insufficient indicators, accuracy of existing tools, and inability to infer and detect implied relations were the main causes of inaccuracy. Furthermore, the quality and accuracy of extracted relations decrease as the complexity and length of a story increases.

Abstract Format

html

Language

English

Format

Print

Accession Number

TG05359

Shelf Location

Archives, The Learning Commons, 12F Henry Sy Sr. Hall

Physical Description

x, 141 leaves ; 28 cm. + 1 computer optical disc.

Keywords

Text data mining; Natural language processing (Computer science)

This document is currently not available here.

Share

COinS