Centre for Computational Linguistics: Projects
Interdisciplinair Centrum voor Recht en Informatica
J. Dumortier, R. Gebruers (CCL), M.-F. Moens & C. Uyttendaele
The SALOMON-project was initiated in cooperation with, and is being carried out at, the Interdisciplinary Centre for Law and Information Technology (ICRI, Faculty of Law, K.U. Leuven). Its goal is to contribute to the automation of legal document categorization and legal data extraction, where the raw information is available in electronic form. More specifically, the aim is to build a prototype which is able to map relevant passages from the full text of judicial decisions onto structured representations. These information formats should be such that they could be used for statistical investigations, as well as for improving the accessibility of large quantities of legal materials, for instance as data structures in summarization and information retrieval applications. Furthermore, though initially intended for handling judgments pertaining to criminal law, the prototype should be readily portable to other legal domains.
The major challenges of the project involve
The requirement of precise categorization, with a high degree of recall and precision, the requirement of robustness, in view of the amount of texts to be handled, and the requirement of portability to new domains impose severe constraints on the possible approaches to the problems involved. Keyword search and pattern matching techniques may be fast and cost-effective, but it is unlikely that they alone will be sufficient for achieving the required accuracy. On the other hand, complex systems based on natural language understanding techniques are still much too slow, too restricted, and not robust enough to be viable for practical uses. By implication, SALOMON has adopted a hybrid approach, starting from the analysis of "coarse" text features, such as stylized phrases and word occurrence frequencies, and introducing more ambitious techniques only to the extent that these are necessary to overcome the weaknesses of the shallow methods.
Further information is available at the Salomon web site at ICRI .
CCL
Layout:
webmaster@ccl.kuleuven.ac.be
Information Provider: Centrum voor Computerlinguïstiek
Comments to the Webmaster:
webmaster@ccl.kuleuven.ac.be
(C) Copyright 1996, CCL.
All Rights Reserved.