 |
Comparing Improved Language Models for Sentence Retrieval in Question Answering
Andreas Merkel, Dietrich Klakow
Saarland University, Spoken Language Systems, 66041 Saarbrücken
A retrieval system is a very important part in a question answering
(QA) framework. It reduces the number of documents to be considered
for finding an answer. For further refinement the documents are split
up into smaller chunks to deal with topic variability in larger
documents. In our case we divided the documents into single
sentences. Then a language model based approach was used to re-rank
the sentence collection.
For this purpose we developed a new language model toolkit. It
implements all standard language modeling techniques and is more
flexible than other tools in terms of backing-off strategies, model
combinations and design of the retrieval vocabulary. With this toolkit
we made re-ranking experiments with standard language model based
smoothing methods like Jelinek-Mercer linear interpolation, Bayesian
smoothing with Dirichlet priors and absolute discounting as well as
some new, improved models. We also experimented with query expansion
depending on the type of a query. On a TREC corpus we demonstrate that
our proposed approaches outperforms the standard methods in terms of
mean reciprocal rank (MRR) by 25%.
|