Unsupervised Approach for Selecting Sentences in Query-based Summarization

Yllias Chali, Shafiq R. Joty

When a user is served with a ranked list of relevant documents by the standard document search engines, his search task is usually not over. He has to go through the entire document contents to judge its relevance and to find the precise piece of information he was looking for. Query–relevant summarization tries to remove the onus on the end–user by providing more condensed and direct access to relevant information. Query–relevant summarization is the task to synthesize a fluent, well–organized summary of the document collection that answers the user questions. We extracted several features of different types (i.e. lexical, lexical semantic, statistical and cosine similarity ) for each of the sentences in the document collection in order to measure its relevancy to the user query. We experimented with two well–known unsupervised statistical machine learning techniques: K–means and EM algorithms and evaluated their performances. For all these methods of generating summaries, we have shown the effects of different kinds of features.

Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery

Submitted: Feb 7, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.