Comparing Maximal Entropy Tagging and Memory-based Tagging

Guy De Pauw (CNTS, Antwerpen)
Walter Daelemans (ILK, Tilburg & CNTS, Antwerpen)
Luc Dehaspe (Computerwetenschappen, Leuven)
Luc De Raedt (Computerwetenschappen, Leuven)

The last few years, many researchers working in the inductive paradigm
(using statistical or machine learning techniques) have investigated
Part of Speech Tagging (POS tagging) not only in its own right, as a
first step in text analysis, but also as a prototypical benchmark task
for the type of disambiguation problems which is paramount in natural
language processing: assigning one of a set of possible labels to a
linguistic object given its context.  Techniques working well in the
area of POS tagging are also likely to work well in a large range of
other NLP problems such as word sense disambiguation and discourse
segmentation when reliable annotated corpora for these problems become
available.  We have thus seen a rapid growth of the number of
techniques which are being applied to this task: neural networks,
decision trees, decision lists, (transformation) rule induction,
inductive logic programming, memory-based learning etc.
 

It is unfortunate that most of this research refrains from thorough systematic empirical and theoretical comparisons between competing techniques. This paper expands on one of the first such studies (van Halteren et al. 1998) in which trigram tagging, Brill tagging, maximal entropy and memory-based tagging were systematically compared on the LOB corpus. Results indicated that the best system was the MXPOST maximal entropy tagger (Ratnaparkhi, 1996), and the second best MBT, the memory-based tagger (Daelemans et al. 1996). In this paper we continue the systematic comparison of the best two systems, and investigate whether the difference is due to the algorithms or to the information sources used by the systems.

To study this, we have compared both taggers using exactly the same information. When comparing MBT with MXPOST on MXPOST features, both systems obtained the same generalization accuracy results (95.4 MXPOST versus 95.5 MBT trained on 935,000 patterns, tested on 45,000 patterns taken from Wall Street Journal corpus). Similarly, when comparing MBT and MXPOST on MBT features, both systems obtained similar results (97.2 MXPOST versus 97.2 MBT, trained on 931,062 patterns, and tested on 115,101 patterns from LOB). We conclude that it is the information sources used rather than the algorithms which account for earlier observed differences in accuracy between both taggers. Finally, we provide a qualitative analysis of the errors made by both systems and suggest ways of combining the strengths of both approaches.

Daelemans, W., J. Zavrel, P. Berck, S. Gillis . `MBT: A Memory-Based Part of Speech Tagger-Generator'. In: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.
Halteren, H. van, J. Zavrel, W. Daelemans . `Improving Data Driven Wordclass Tagging by System Combination.' Proceedings of COLING and ACL 1998, Montreal.
Ratnaparkhi, A. `A maximum entropy part of speech tagger.' Conference on empirical methods in natural language processing, University of Pennsylvania, 1996.