N. B. Sussman, O. T. Macina, H. G. Claycamp, S. G. Grant, H. S. Rosenkranz
We propose an approach that incorporates into the model-building process resampling of the parent chemical database to form N random databases, from which N independent random models are generated. The multiple random sampling allows us to treat the parent database as an empirical chemical population distribution. This approach helps to overcome to some degree the representation bias in the parent database. Model building researchers will recognize the similarity of this approach to the bootstrap. In fact, it is the bootstrap methodology but applied not only to estimate the distribution of the prediction accuracy/error, but also to evaluate the consistency of random models developed from a given database. The idea for employing the bootstrap approach for this purpose was presented by Efron and Gong in a demonstration that dealt with a similar prediction problem, a situation which they deemed to be "hopelessly beyond traditional theoretical solutions" (Efron and Gong, 1983).