Proofs for the optimality of classification in real-world machine learning situations are constructed. The validity of each proof requires reasoning about the probability of certain subsets of feature vectors. It is shown that linear discriminants classify by making the least demanding assumptions on the values of these probabilities. This enables measuring the confidence of classification by linear discriminants. We demonstrate experimentally that when linear discriminants make decisions with high confidence, their performance on real-world data improves significantly, to the point where they beat the best known nonlinear techniques on large portions of the data.