Michele Banko, Eric Brill, Susan Dumais, and Jimmy Lin
The design of the AskMSR question answering system is motivated by recent observations in natural language processing that for many applications, significant improvements in accuracy can be attained simply by increasing the amount of data used for learning (e.g., Banko and Brill, 2001). By taking advantage of the vast amount of online text available via the worldwide web, rather than relying on an approach that depends heavily on natural language intensive techniques, we developed a simple but effective question answering system. Many groups working on question answering use a variety of linguistic resources - part-of-speech tagging, parsing, named entity extraction, WordNet, etc, We chose instead to focus on the tremendous resource that the web provides simply as a gigantic data repository. The web, which is home to billions of pages of electronic text, is orders of magnitude larger than the TREC QA document collection, which consists of fewer than 1 million documents.