Representing Text Chunks
Erik F. Tjong Kim Sang
Centrum voor Nederlandse Taal en Spraak
Universitaire Instelling Antwerpen
Universiteitsplein 1
B-2610 Wilrijk
Belgium
Phone: +32 3 820 2765
Fax: +32 3 820 2762
E-mail: erikt@uia.ua.ac.be
Dividing sentences in chunks of words is a useful preprocessing
step for a parsing process. Ramshaw and Marcus have introduced
a "convenient" data representation for chunking tasks by converting
them to tagging tasks. In this paper we will provide empiric
evidence for the fact that text chunking programs can profit from
this data representation. We will compare the performance of a
Memory-Based Learning program on recognition of base NPs with
different data representations: standard bracket structures
and different tagging schemes.