David G. Stork and Chuck P. Lam
We describe the Open Mind Initiative, a framework for building intelligent systems collaboratively over the internet, and focus on one of its simpler component projects, Open Mind Animals. The Initiative extends traditional open source development methods by allowing non-expert netizens to contribute informal data over the internet. Such data is used to train classifiers or guide automatic inference systems, and thus it is important that only data of high accuracy and consistency be accepted. We identify a number of possible sources of poor data in Animals -- several of which are generic and applicable to a range of open data collection projects -- and implement a system of software modules for automatically and semi-automatically preventing poor data from being accepted. Our system, tested in a controlled laboratory intranet, filters faulty data through a variety of mechanisms and leads to accurate decision tree classifiers. Our reusable modules can be employed in our planned large-scale internet deployment of Animals and other Open Mind projects.