Brian A. Stone, Dennis Bahler
We train artificial neural networks to predict the results of long-term rodent bioassays by using data collected by the National Toxicology Program (NTP). The data set consists of salmonella mutagenicity assay results, subchronic pathology data, information on route, strain, and sex/species, physical chemical parameters, and structural alerts for 744 individual experiments. First, an automated method was devised to reduce the set of over 2800 possible attributes of these experiments to the 74 attributes which can be shown to be most relevant to this prediction task. Second, using these attributes a trained neural network model has been generated that has a cross-validated accuracy on unseen data of 89.23%. Third, a list of 22 M-of-N rules was extracted which are readable by humans and which explain the knowledge learned by the trained artificial neural network. Furthermore, the cross-validated accuracy of the rule set is within 2.5% of the full network model. These results contribute to the ongoing process of evaluating and interpreting the data collected from chemical toxicity studies.