Font Size:
Arabic Cross-Document NLP for the Hadith and Biography Literature
Last modified: 2012-05-16
Abstract
Recently cross-document integration and reconciliation of extracted information became of interest to researchers in Arabic natural language processing. Given a set of documents $A$, we use Arabic morphological analysis, finite state machines, and graph transformations to extract named entities Na and relations Ra expressed as edges in a graph G = (Na, Ra). We use the same techniques to extract entities Nb and relations Rb from a separate set of documents B. We use G to disambiguate Nb and R and we integrate the resulting entities back into G by annotating the nodes and edges in G with elements from Nb. We apply our approach in an iterative manner. Our results show a significant increase in accuracy from 41% to 93% after applying this cross-document NLP methodology to hadith and biography documents.
Full Text:
PDF