The automatic resolution of "het" in a machine learning approach to Dutch coreference resolution.
Veronique Hoste, Iris Hendrickx and Walter Daelemans
Hogeschool Gent, University of Antwerp
We discuss the automatic detection of pronominal coreferential relations in Dutch. Previous research has shown that automatic pronominal coreference resolution for Dutch mainly suffers from two shortcomings: the inability to distinguish between the referential and the pleonastic use of the pronoun "het" and the inability to recognize anaphors referring to the linguistic gender of the antecdent.
In this talk, we focus on the different uses of the third singular neuter pronoun and on the automatic detection of these uses. For this task, the KNACK-2002 Dutch coreferentially annotated corpus of news magazine texts was used as base material. Two annotators annotated the corpus in parallel and both provided the texts with different tags representing the referential anaphoric and cataphoric (as in 2) and non-referential (as in 1) use of "het".
(1) “Een god van het vuur. Als vice-minister van Defensie heeft Paul Wolfowitz eigenlijk een bescheiden job in de Amerikaanse regering. Hoe komt het dan dat hij zoveel invloed heeft in het witte huis?”
(2) Maar voorzitter Spiritus-Dassesse gelooft niet in het nieuwe plan. Het lijkt teveel op het vorige.
This information was used to re-train the pronominal coreference resolution system. We report on the results of these experiments for two learning systems, viz. the maximum entropy modeling package Maxent and the TiMBL memory-based learner.
|