Data Quality Issues In Toxicological Knowledge Discovery

Christoph Helma, Eva Gottmann, Stefan Kramer, and Bernhard Pfahringer

Every SAR technique for toxicity prediction relies on the exact estimation and representation of chemical and toxicological properties. We will present potential sources of errors associated with the utilization of l~trge, noncongeneric datasets and complex toxicologi- (:al endpoints (e.g. carcinogenicity). According to (~xperience we have identified the major problems in the areas of compound identification, descriptor calculation and toxicity data. Generally, we consider the chemical data as more reliable than the results from toxicity experiments. As it is impossible to tackle the data quality problem on a case by case basis for a large mlmber of compounds, we will propose some possibilitie.~ for routine quality control of large datasets.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.