AAAI Publications, Twenty-Fifth International FLAIRS Conference

Font Size: 
Arabic Cross-Document NLP for the Hadith and Biography Literature
Fadi Zaraket, Jad Makhlouta

Last modified: 2012-05-16

Abstract


Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents $A$, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities Na and relations Ra expressed as edges in a graph G = (Na, Ra). We use the same techniques to extract entities Nb and relations Rb from a separate set of documents B. We use G to disambiguate Nb and R and we integrate the resulting entities back into G by annotating the nodes and edges in G with elements from Nb. We apply our approach in an iterative manner. Our results show a significant increase in accuracy from 41% to 93% after applying this cross-document NLP methodology to hadith and biography documents.

Full Text: PDF