The syntax of DATR, like its name and its minimalist philosophy, owes more than a little to that of the unification grammar language PATR (Shieber 1986). With hindsight this may have been a bad design decision since similarity of syntax tends to imply a similarity of semantics. And, as we shall see below, and elsewhere, there is a subtle but important semantic difference.
As a feature-based formalism with a syntax modelled on PATR, it would be reasonable to expect that DATR can be used to describe directed acyclic graphs (DAGs) in a PATR -like fashion. Consider an example such as the following:
DAG1:
<vp agr> == <v agr>
<v agr per> == 3
<vp agr gen> == masc.
This looks like simple reentrancy from which we would expect to be able
to infer:
DAG1:
<vp agr per> = 3.
And, indeed, this turns out to be valid. But matters are not
as simple as the example makes them appear: if DAG1 was
really the DAG it purports to be, then we would also expect to be able
to infer:
DAG1:
<v agr gen> = masc.
But this is not valid, in fact <v agr gen> is undefined. It might be
tempting to conclude from this that the equality operator in DATR is
very different from the corresponding operator in PATR, but this would
be to misunderstand what has happened in this example. In fact, the
semantics of the statement
DAG1:
<vp agr> == <v agr>.
taken in isolation is very similar to the semantics of the corresponding
PATR statement: both assert equality of values associated with the two
paths. The DATR statement is slightly weaker in that it allows the
left-hand-side to be defined when the right-hand-side is undefined. But,
even in DATR , if both sides are defined they must be the same, so, in
principle, the value of the left-hand-side does semantically constrain
the value of the right-hand-side. However, in a DATR description,
specifying explicit values for extensions of the left-hand-side of such
an equality constraint overrides its effect, and thus
does not influence the values on its right-hand-side.
Another difference lies in the fact that DATR subpaths and superpaths can have values of their own:
DAG2:
<v agr> == sing
<v agr per> == 3.
From this little description we can derive the following statements,
inter alia:
DAG2:
<v agr> = sing
<v agr num> = sing
<v agr per> = 3
<v agr per num> = 3.
From the perspective of a standard untyped DAG-encoding language like PATR ,
this is strange. In PATR , if <v agr per> has value 3,
then neither <v agr> nor <v agr per num> can have (atomic) values.
As these examples clearly show, DATR descriptions do not map trivially into (sets of) standard DAGs (although neither are they entirely dissimilar). But that does not mean that DATR descriptions cannot describe standard DAGs. Indeed, there are a variety of ways in which this can be done. An especially simple approach is possible when the DAGs one is interested are all built out of a set of paths whose identity is known in advance (Kilbury et al. 1991). In this case, we can use DATR paths as DAG paths, more or less directly:
PRONOUN2:
<referent> == '<' 'NP' referent '>'.
She2:
<> == PRONOUN2
<case> == nominative
<person> == third
<number> == singular.
From this description, we can derive the following theorems:
She2:
<case> = nominative
<person> = third
<number> = singular
<referent> = < NP referent >.
We can also derive the following un-DAG-like consequences, of course:
She2:
<case person> = nominative
<person number> = third
<referent referent referent> = < NP referent >.
But these nonsensical theorems will be of no concern to a DATR -invoking
NLP system that is able to specify in advance which paths are of
interest for DAG-construction and to ignore all the rest
.
A more sophisticated approach
uses DATR itself to construct a DAG description
(in the notation of your choice) as a value
:
IDEM:
<> ==
<$atom> == $atom <>.
PATH:
<> == '<' IDEM '>'.
LHS_EQ:
<> == PATH '='.
LHS_EQ_RHS:
<> == LHS_EQ "<>".
PRONOUN1:
<dag> == [ LHS_EQ_RHS:<case>
LHS_EQ_RHS:<person>
LHS_EQ_RHS:<number>
LHS_EQ:<referent> PATH:<'NP' referent> ].
She1:
<> == PRONOUN1
<case> == nominative
<person> == third
<number> == singular.
From this description, we can derive the following theorem:
She1:
<dag> = [ < case > = nominative
< person > = third
< number > = singular
< referent > = < NP referent > ].
The sequence of atoms on the right hand side of this equation is
just a sequence of atoms as far as DATR is concerned. But a
grammar or a parser that expects to see DAGs represented as they are
here can interpret the DATR values as easily as it can the contents
of a file. Indeed, it will be interpreting the contents
of a file if DATR has been used to define a lexicon that has
subsequently been compiled out, rather than being accessed directly
by components of the NLP system (see Section 7,
below). We are not, of course, claiming that textual representations
will standardly provide the optimal interface between an
implementation of DATR and the larger NLP system in which it is
embedded (cf., e.g., Duda & Gebhardi 1994).