 |
Scaling Minimal Generalization
Bart Cramer, John Nerbonne
CLCG, Rijksuniversiteit Groningen
In this study, we model the phonotactics using minimal generalization, a
stochastic rule-based system proposed by Albright and Hayes (2003), who
used this system successfully on learning the past tense in English. Their
system generates rules that try to generalize over the phonetic features of
the input (in our case, the CELEX database). These rules are hypotheses which
might prove wrong in other parts of the input; hence they are 'stochastic'.
This algorithm maintains the explicitness of rule-based systems, but
adds an element of stochastic comparison. The results from Albright and
Hayes also suggest that the model captures some aspects of cognitive
representation faithfully.
However, when we applied this methodology to the problem of phonotactics,
it does not immediately generalise well. It accepted well-formed examples
well, but was ill-equipped to reject strings as ill formed.
We therefor propose improvements to the original algorithm, first, to force
it to greater discrimination, and second, to take into account implicit
negative information as well. The improved algorithm reduces
the number of rules by a factor 5, and thus improves the transparency of the
output. It also cuts the number of errors (both false positives and
false negatives) in half compared to the original algorithm.
Albright, Adam and Bruce Hayes (2003) "Rules vs. Analogy in English Past
Tenses: A Computational/Experimental Study" in: Cognition 90, 2003,
pp. 119-161
|