It is unfortunate that most of this research refrains from thorough systematic empirical and theoretical comparisons between competing techniques. This paper expands on one of the first such studies (van Halteren et al. 1998) in which trigram tagging, Brill tagging, maximal entropy and memory-based tagging were systematically compared on the LOB corpus. Results indicated that the best system was the MXPOST maximal entropy tagger (Ratnaparkhi, 1996), and the second best MBT, the memory-based tagger (Daelemans et al. 1996). In this paper we continue the systematic comparison of the best two systems, and investigate whether the difference is due to the algorithms or to the information sources used by the systems.
To study this, we have compared both taggers using exactly the same information. When comparing MBT with MXPOST on MXPOST features, both systems obtained the same generalization accuracy results (95.4 MXPOST versus 95.5 MBT trained on 935,000 patterns, tested on 45,000 patterns taken from Wall Street Journal corpus). Similarly, when comparing MBT and MXPOST on MBT features, both systems obtained similar results (97.2 MXPOST versus 97.2 MBT, trained on 931,062 patterns, and tested on 115,101 patterns from LOB). We conclude that it is the information sources used rather than the algorithms which account for earlier observed differences in accuracy between both taggers. Finally, we provide a qualitative analysis of the errors made by both systems and suggest ways of combining the strengths of both approaches.
Daelemans, W., J. Zavrel, P. Berck, S. Gillis . `MBT: A Memory-Based
Part of Speech Tagger-Generator'. In: E. Ejerhed and I. Dagan (eds.)
Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen,
Denmark, 14-27, 1996.
Halteren, H. van, J. Zavrel, W. Daelemans . `Improving Data Driven
Wordclass Tagging by System Combination.' Proceedings of COLING and
ACL 1998, Montreal.
Ratnaparkhi, A. `A maximum entropy part of speech tagger.' Conference
on empirical methods in natural language processing, University of
Pennsylvania, 1996.