Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

Marc Miquel-Ribé; David Laniado

doi:10.1609/icwsm.v13i01.3260

Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

Authors

Marc Miquel-Ribé Universitat Pompeu Fabra, Catalunya
David Laniado Eurecat, Centre Tecnològic de Catalunya

DOI:

https://doi.org/10.1609/icwsm.v13i01.3260

Abstract

In this paper we present the Wikipedia Cultural Diversity dataset. For each existing Wikipedia language edition, the dataset contains a classification of the articles that represent its associated cultural context, i.e. all concepts and entities related to the language and to the territories where it is spoken. We describe the methodology we employed to classify articles, and the rich set of features that we defined to feed the classifier, and that are released as part of the dataset. We present several purposes for which we envision the use of this dataset, including detecting, measuring and countering content gaps in the Wikipedia project, and encouraging cross-cultural research in the field of digital humanities.

Downloads

Published

2019-07-06

How to Cite

Miquel-Ribé, M., & Laniado, D. (2019). Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions. Proceedings of the International AAAI Conference on Web and Social Media, 13(01), 620-629. https://doi.org/10.1609/icwsm.v13i01.3260

Download Citation

Issue

Vol. 13 (2019): Thirteenth International AAAI Conference on Web and Social Media

Section

Dataset Papers

Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information