Data-Oriented Translation

 
In this paper, we present a statistical approach to Machine Translation (MT)
that is based on the Data-Oriented Parsing (DOP) model (Bod 1995). A short
introduction to MT and DOP is given, and a description of a model which
combines these two fields: the Data-Oriented Translation model.  In this
model, we use linked subtree-pairs for creating a derivation of a sentence.
Each subtree-pair consists of two trees, each in a different language, and
has a certain probability.  When a derivation has been formed with the
subtrees in the source language, these subtrees can be substituted with
their linked counterparts in the target language, thus forming a translated
derivation. In turn, this translated derivation can be flattened into a
translated sentence.  Since there are typically many different derivations
for the same sentence in the source language, there can be as many different
translations for it. The probability of one of these translations can be
calculated as the total probability of all the derivations that form this
translation. We will show how the most probable translation can be computed
by means of Monte Carlo disambiguation, and we discuss some experiments with
the translation of idioms.
 
 
Arjen Poutsma 
Dept. of Computational Linguistics
Spuistraat 134
1012 VB Amsterdam
mail: poutsma@wins.uva.nl