theohiwbkrlaweconsocartspsyscitweagrmedfarmfaber
CCL Centre for Computational Linguistics K.U.Leuven
Leuven    - Search Staff Students Organizational chart Search matrix Keywords
Home
Call for papers
Abstract Submission
Important Dates
Location
Program
Registration
Proceedings
Local Organization
Sponsors
Pictures
Centre for Computational Linguistics
---
-  

CLIN 17 - Program

An Efficient Approach of Clause Boundary Identification in Unrestricted English Texts

Srikanth Kesavarapu, Rajesh Kumar Magham

Project Manager in MNC

Identifying the structure of the sentence is very essential for many NLP tasks such as machine translation, aligning parallel texts, text-to-speech systems, discourse processing etc. In this paper, we addressed the clause splitting problem and implemented the multilingual method using two different machine learning techniques namely K-nearest neighborhood algorithm on Susanne Corpus and Bayesian Classification algorithm on Penn-Tree Bank Corpus. After implementing the machine learning phase, we could greatly improve the accuracy and also could classify using simple language specific rules. We investigated other existing approaches to recognize clauses for review purpose. Finally, we found that Bayesian approach is efficient in terms of accuracy, less complex in terms of computation and human efforts.

  
NEWSFLASH
CLIN-17 PICTURES now available

   
K.U.Leuven - CWIS  Copyright © Katholieke Universiteit Leuven | reacties op de inhoud: Vincent Vandeghinste
Realisatie: Vincent Vandeghinste | Laatste wijziging: 20 november 2006 | Disclaimer
URL: http://www.ccl.kuleuven.be/CLIN17/sha2.php