AAAI Publications, Twenty-Fifth International FLAIRS Conference

Font Size: 
A Pruning Based Approach for Scalable Entity Coreference
Dezhao Song, Jeff Heflin

Last modified: 2012-05-16


Entity coreference is the process to decide which identifiers (e.g., person names, locations, ontology instances, etc.) refer to the same real world entity. In the Semantic Web, entity coreference can be used to detect equivalence relationships between heterogeneous Semantic Web datasets to explicitly link coreferent ontology instances via the owl:sameAs property. Due to the large scale of Semantic Web data today, we propose two pruning techniques for scalably detecting owl:sameAs links between ontology instances by comparing the similarity of their context graphs. First, a sampling based technique is designed to estimate the potential contribution of each RDF node in the context graph and prune insignificant context. Furthermore, a utility function is defined to reduce the cost of performing such estimations. We evaluate our pruning techniques on three Semantic Web instance categories. We show that the pruning techniques enable the entity coreference system to run 10 to 35 times faster than without them while still maintaining comparably good F1-scores.


Semantic Web; Entity Coreference; Linked Data; Pruning; Scalability

Full Text: PDF