Hypothesis testing in machine learning – for instance to establish whether the performance of two algorithms is significantly different or to assess the (in)dependence between (mixed-type) variables – is usually performed using null hypothesis significance tests (nhst). Yet the nhst methodology has well-known drawbacks. For instance, the claimed statistical significances do not necessarily imply practical significance. Moreover, nhst cannot verify the null hypothesis and thus cannot recognize equivalent classifiers. Most important, nhst does not answer the question of the researcher: which is the probability of the null and of the alternative hypothesis, given the observed data? Bayesian hypothesis tests overcome such problems. They compute the posterior probability of the null and the alternative hypothesis. This allows to detect equivalent classifiers and to claim statistical significances which have a practical impact. We developed Bayesian counterparts of the most commonly test adopted in machine learning, such as the correlated t-test and the signed-rank test, test to detect (in)dependence and many others.

See also our Tutorial @ECML Comparing competing algorithms: Bayesian versus frequentist hypothesis testing