Conceptual Clustering in Structured Databases: A Practical Approach

Authors

A. Ketterlin

P. Gançarski

and J. J. Korczak

LSIIT

Université Louis Pasteur

France

Track:

All Contents

Downloads:

Download PDF

Abstract:

KDD deals with the ready data, available in all scientific and applied domains. However, some domains with comprehensive and conclusive data have severe data security problems. To exclude the reidentification risk of individual cases, e.g. persons or companies, the access to these data is rigidly restricted, and often KDD applications are not allowed at all. In this paper, we discuss data privacy issues based on our experience with some applications of the discovery system Explora and other data analysis approaches. At first, some examples of applications are presented referring to a simple classification organized according to two dimensions important for the privacy discussion. Then we treat the reidentification risk and discuss anonymization methods to overcome these problems. Aggregation and synthetization methods are discussed in more detail. There is a tradeoff between the reduction of the reidentification risk and the preservation of the statistical content of data. We analyse for some main KDD patterns, how far the statistical content of anonymized data is still sufficient. In principle, KDD needs aggregate events. Since the event space of a dataset is very large, a static precomputation of all possible events is often not viable. We propose an architectural solution of a modular KDD system including a separate data server handling also data security requirements and ensuring that only dynamically aggregated data leave the server and can be analysed by the discovery modules of the KDD system. Finally, some other data privacy aspects are addressed.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.