Perhaps surprisingly, DATR turns out to be an excellent language
for defining finite state transducers (FSTs). A path can be used as
the input tape and a
value as the output tape (recall that the DATR default mechanism
means that extensions to left-hand-side paths are automatically
carried over as extensions to right-hand-side paths, as discussed in
Section 3.3, above). Nodes can be used for
states, or else states can be encoded in attributes that are prefixed
to the current input path. Here, for example, is a very simple
DATR FST that will transduce a path such as <subj 1 sg futr obj 2
sg like> into the value ni ta ku penda (Swahili for I
will like you)
:
S1:
<subj 1 sg> == ni S2:<>
<subj 2 sg> == u S2:<>
<subj 3 sg> == a S2:<>
<subj 1 pl> == tu S2:<>
<subj 2 pl> == m S2:<>
<subj 3 pl> == wa S2:<>.
S2:
<past> == li S3:<>
<futr> == ta S3:<>.
S3:
<obj 1 sg> == ni S4:<>
<obj 2 sg> == ku S4:<>
<obj 3 sg> == m S4:<>
<obj 1 pl> == tu S4:<>
<obj 2 pl> == wa S4:<>
<obj 3 pl> == wa S4:<>.
S4:
<like> == penda.
Although the example is trivial, the technique is both powerful and
useful. Gibbon (1990), for example, has made notably effective use
of it in his treatments of African tone systems.
Much of the computational phonology and morphology of the last decade
and a half has depended on FSTs. Potential
lexical applications come readily to mind - for example, the
orthographic spelling rules for English suffixation (cf.
sky/skies). We give below a small fragment of such an FST in which
+ is used as a morpheme boundary marker. Note the role of
DATR variables in giving concise expression to the rules:
# vars $abc: a b c d e f g h i j k l m n o p q r s t u v w x y z.
# vars $vow: a e i o u.
SPELL:
<> ==
<+> == <>
<$abc> == $abc <>
<e + $vow> == $vow <>.
These axioms then give rise to theorems such as these:
SPELL:
<l o v e> = l o v e
<l o v e + s> = l o v e s
<l o v e + e d> = l o v e d
<l o v e + e r> = l o v e r
<l o v e + l y> = l o v e l y
<l o v e + i n g> = l o v i n g
<l o v e + a b l e> = l o v a b l e.
