Christoph Helma, Eva Gottmann, Stefan Kramer, and Bernhard Pfahringer
Every SAR technique for toxicity prediction relies on the exact estimation and representation of chemical and toxicological properties. We will present potential sources of errors associated with the utilization of l~trge, noncongeneric datasets and complex toxicologi- (:al endpoints (e.g. carcinogenicity). According to (~xperience we have identified the major problems in the areas of compound identification, descriptor calculation and toxicity data. Generally, we consider the chemical data as more reliable than the results from toxicity experiments. As it is impossible to tackle the data quality problem on a case by case basis for a large mlmber of compounds, we will propose some possibilitie.~ for routine quality control of large datasets.