 |
An Efficient Approach of Clause Boundary Identification in Unrestricted English Texts
Srikanth Kesavarapu, Rajesh Kumar Magham
Project Manager in MNC
Identifying the structure of the sentence is very essential for many NLP tasks such as machine translation, aligning parallel texts, text-to-speech systems, discourse processing etc. In this paper, we addressed the clause splitting problem and implemented the multilingual method using two different machine learning techniques namely K-nearest neighborhood algorithm on Susanne Corpus and Bayesian Classification algorithm on Penn-Tree Bank Corpus. After implementing the machine learning phase, we could greatly improve the accuracy and also could classify using simple language specific rules. We investigated other existing approaches to recognize clauses for review purpose. Finally, we found that Bayesian approach is efficient in terms of accuracy, less complex in terms of computation and human efforts.
|