On Testing of Uniform Samplers
Recent years have seen an unprecedented adoption of artificial intelligence in a wide variety of applications ranging from medical diagnosis, automobile industry, security to aircraft collision avoidance. Probabilistic reasoning is a key component of such modern artificial intelligence systems. Sampling techniques form the core of the state of the art probabilistic reasoning systems.
The divide between the existence of sampling techniques that have strong theoretical guarantees but fail to scale and scalable techniques with weak or no theoretical guarantees mirrors the gap in software engineering between poor scalability of classical program synthesis techniques and billions of programs that are routinely used by practitioners. One bridge connecting the two extremes in the context of software engineering has been program testing. In contrast to testing for deterministic programs, where one trace is sufficient to prove the existence of a bug, in case of samplers one sample is typically not sufficient to prove non-conformity of the sampler to the desired distribution. This makes one wonder whether it is possible to design testing methodology to test whether a sampler under test generates samples close to a given distribution.
The primary contribution of this paper is an affirmative answer to the above question when the given distribution is a uniform distribution: We design, to the best of our knowledge, the first algorithmic framework, Barbarik, to test whether the distribution generated is ε−close or η−far from the uniform distribution. In contrast to the sampling techniques that require an exponential or sub-exponential number of samples for sampler whose support can be represented by n bits, Barbarik requires only O(1/(η−ε)4) samples. We present a prototype implementation of Barbarik and use it to test three state of the art uniform samplers over the support defined by combinatorial constraints. Barbarik can provide a certificate of uniformity to one sampler and demonstrate nonuniformity for the other two samplers.
Erratum: This research is supported in part by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: [AISG-RP-2018-005])