Lexical Space: Unsupervised Learning of Word Representations for the Benefit of Supervised Learning

Jakub Zavrel
ILK / Computational Linguistics, Tilburg University
PO-box 90153, 5000 LE Tilburg, the Netherlands
zavrel@kub.nl
 
When supervised Machine Learning methods are applied to learn an NLP
disambiguation task from annotated corpus examples, we often find that
the amount of training data is quite limited. Especially when the
features of the task are given by individual lexical items and their
conjunctions, the number of combinations quickly rises beyond any
feasible amount of training data we might hope to
collect. Fortunately, however, there are large amounts of raw text
available to learn about the behavior of words.
 

Words that occur in similar contexts in raw text tend to have similar syntacto-semantic properties. These similarities in distribution can be exploited by using vectors of co-occurrence counts as quasi-syntactic word representations (Schuetze 1995; Finch 1993). In this paper, the notion of a Lexical Space is formalized and an experimental study is presented which examines the effects of various biases and information sources on the organization of the similarity space. The distances in the Lexical Space are then (re)used as a similarity gradient in a Memory-Based Learner for several well-known NLP disambiguation tasks (PP-attachment, POS-tagging, Word sense disambiguation). The results are compared to the Modified Value Difference Metric (Stanfill\& Waltz, 1986; Cost \& Salzberg, 1993), similarity metric that only takes into account the distribution of words in the supervised training data.