Word-streams for representing context in word maps

College

College of Computer Studies

Department/Unit

Information Technology

Document Type

Archival Material/Manuscript

Publication Date

2007

Abstract

The most prominent use of Self-Organizing Maps (SOMs) in text archiving and retrieval is the WEBSOM. In WEBSOM, a map is first used to reduce the dimensionality of the huge term frequency table by training a so-called word-category map. This wordcategory map is then used to convert the individual documents into their respective document signatures (i.e. histogram of words) which form the basis for training a document map. This document map is the final text archive. WEBSOM has been shown to be a powerful and versatile text archiving system. However, it spends (wastes) enormous computer resources in the computation of the left and right context of each and every word that appears in any of the documents in the text corpus. This paper presents an alternative scheme for incorporating context in the encoding of the words in such a way that the computation of the probabilistic centroid, which is inherent in the SOM training algorithm, is taken full advantage of. Several experiments are conducted to compare this new scheme with WEBSOM’s context averaging scheme.

html

Disciplines

Computer Sciences

Note

Undated; Publication/creation date supplied

Keywords

Context (Linguistics); Self-organizing maps

Upload File

wf_no

This document is currently not available here.

Share

COinS