Bruce W. Porter and James C. Lester
Empirically evaluating explanation generators poses a notoriously difficult problem. To address this problem, we constructed KNIGHT, a robust explanation generator that dynamically constructs natural language explanations about scientific phenomena. We then undertook the most extensive and rigorous empirical evaluation ever conducted on aa explanation generator. First, KNIGHT constructed explanations on randomly chosen topics from the Biology Knowledge Base. This is an immense structure that contains more than 180,000 facts. We then enlisted the services of a panel of domain experts to produce explanations on these same topics. Finally, we submitted all of these explanations to a second panel of domain experts, who graxied the explanations on an A-F scale. KNIGHT scored within "half a grade" of the domain experts. Its performance exceeded that of one of the domain experts.