Computational Complexity and the Nature of Quantum Mechanics

Alessio Benavoli, Alessandro Facchini, Marco Zaffalon

In a previous paper, we derived the axioms of QM from the same *rationality principles* that underlie the subjective foundation of probability. We were able to show the way QM is similar to classical probability, but we weren’t able to fully grasp the differences between the two theories. Why does entanglement exist? What is entanglement?

To model these differences, Von Neumann changed the logic of the events, while Dirac and Feynman introduced negative probabilities. But why?

In this new work, we believe we have finally answered that question and the why is simple to understand. QM is a theory of computational rationality, that is based on a different notion of non-negativity (that is, the notion for a real-valued function to be non-negative). The reason is purely computational. This different notion of non-negativity allows the inferences in the theory to be computed in polynomial time. Conversely, in the same settings, classical probability is NP-hard. In other words, we have proven that the only physics’ axiom in QM is *computational tractability*. All the weirdness (different logic of events, negative probabilities, and entanglement) is a simple consequence of that. Moreover, we show that entanglement is a characteristic of computational rationality and we give an example of entanglement outside QM.

There is more, actually QM is a theory of *imprecise probability* (because we show that, when QM is compatible with classical probability, the density matrix is actually a truncated moment matrix).

The above linked paper is really simple to read and understand. You do not need to know QM.

A longer version can be found here http://arxiv.org/abs/1902.03513

Enjoy it!

]]>Indeed, a wide range of other frameworks, including—but by no means limited to—interval probabilities, sets of probability measures, non-additive set functions, non-linear expectations, partial preference orderings, game-theoretic probability and choice functions, offer powerful robust alternatives and extensions to the probabilistic one. These frameworks are capable of dealing with model uncertainty and indecision and are commonly referred to as imprecise probability models. For the past twenty years, their theoretical development and practical application have been the focus of the biennial ISIPTAs.

What we love about these conferences is their friendly and co-operative style, the strong emphasis on in-depth discussion, the true openness to new ideas and—last but not least—the willingness to share ideas in a way that others can build on them. We hope that you too will both enjoy and contribute to this unique atmosphere. If you are new to the community, and you are not sure whether your work fits the scope of the conference, do not hesitate to contact the PC Board about this.

The upcoming eleventh edition takes ISIPTA back to its roots: twenty years after its first edition, the International Symposium on Imprecise Probabilities: Theories and Applications returns to the medieval city centre of Ghent. What have we learned since then? Which questions still remain open? And what problems should we tackle next? You are hereby warmly invited to join us in shaping the future of imprecise probabilities. Make sure to bring a research state of mind.

]]>Topics of interests include but are not limited to the following:

IP and (modal/epistemic/dependence/probabilistic/possibilistic…) logic

IP and game/decision theory

IP and formal epistemology/deductive sciences

IP and StarAI

IP and coalgebra

applications of logic and formal languages to IP

applications of IP to logic

Logical, algebraic, categorical foundations of IP

Submission Guideline

Submission Period: Mar 1st, 2019 – Jun 1st, 2019

All submitted papers under this call will undergo the standard review process of the journal.

All papers should be submitted to IJAR website http://www.evise.com/evise/jrnl/IJA

by choosing the Special Issue “VSI:Imprecise probabilities, logic and rationality”.

All online submissions should follow the “Guide for Authors” of the journal.

Guest Editors:

Prof. Dr. Alessio Benavoli

Alessio@idsia.ch

Dr. Alessandro Facchini

Alessandro.facchini@idsia.ch

Dr. Fabio Zanasi

f.zanasi@ucl.ac.uk

It works with continuous, discrete and mixed variables.

Here you can find some additional info, setup instructions and 4 examples (notebooks):

https://github.com/PyRational/PyRational/blob/master/notebooks/index.ipynb

The notebooks (and relative examples) are very simple, their purpose (at the moment) is only to highlight the functionalities of the library.

I welcome contributions, after all it is an open source project.

You can contribute in several ways:

1. giving me some feedback as user;

2. being an author of new notebooks where you can write down your favorite models

using PyRational;

3. extending the library by including other functionalities, models etc.

**Baycomp** is a library for Bayesian comparison of classifiers.

Functions compare two classifiers on one or on multiple data sets. They compute three probabilities: the probability that the first classifier has higher scores than the second, the probability that differences are within the region of practical equivalence (rope), or that the second classifier has higher scores. We will refer to this probabilities as `p_left`

, `p_rope`

and `p_right`

. If the argument `rope`

is omitted (or set to zero), functions return only `p_left`

and `p_right`

.

The region of practical equivalence (rope) is specified by the caller and should correspond to what is “equivalent” in practice; for instance, classification accuracies that differ by less than 0.5 may be called equivalent.

Similarly, whether higher scores are better or worse depends upon the type of the score.

The library can also plot the posterior distributions.

The library can be used in three ways.

- Two shortcut functions can be used for comparison on single and on multiple data sets. If
`nbc`

and`j48`

contain a list of average

classification accuracies of naive Bayesian classifier and J48 on a collection of data sets, we can call`>>> two_on_multiple(nbc, j48, rope=1) (0.23124, 0.00666, 0.7621)`

(Actual results may differ due to Monte Carlo sampling.)

With some additional arguments, the function can also plot the posterior distribution from which these probabilities came.

- Tests are packed into test classes. The above call is equivalent to
`>>> SignedRankTest.probs(nbc, j48, rope=1) (0.23124, 0.00666, 0.7621)`

and to get a plot, we call

`>>> SignedRankTest.plot(nbc, j48, rope=1, names=("nbc", "j48"))`

To switch to another test, use another class::

`>>> SignTest.probs(nbc, j48, rope=1) (0.26508, 0.13274, 0.60218)`

- Finally, we can construct and query sampled posterior distributions.
`>>> posterior = SignedRankTest(nbc, j48, rope=0.5) >>> posterior.probs() (0.23124, 0.00666, 0.7621) >>> posterior.plot(names=("nbc", "j48"))`

Install from PyPI:

```
pip install baycomp
```

User documentation is available on https://baycomp.readthedocs.io/.

A detailed description of the implemented methods is available in Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis, Alessio Benavoli, Giorgio Corani, Janez Demšar, Marco Zaffalon. Journal of Machine Learning Research, 18 (2017) 1-36.

]]>There is another and more elegant way to derive this inequality:

$$Cov(X,Y)^2\leq Var(X)Var(Y)$$

To do that, we introduce again our favorite subject, Alice. Let us summarize the problem again. Assume that there two real variables $X,Y$ and that Alice only knows their first $\mu_x=E(X),\mu_y=E(Y)$ and second $E(X^2),E(Y^2)$ moments (in other words she only knows their means and variances, since $Var(Z)=E(Z^2)-E(Z)^2=\sigma_z^2$).

Alice wants to compute $Cov(X,Y)$.

Since Alice does not know the joint probability distribution $P(X,Y)$ of $X,Y$ (she only knows the first two moments), she cannot compute $Cov(X,Y)$. However, she can compute bounds for $Cov(X,Y)$, or in other words, she can aim to solve the following problem

$$

\begin{array}{l}

~\max_{P} E[(X-\mu_x)(Y-\mu_y) ]=\int (X-\mu_x)(Y-\mu_y) dP(X,Y)\\

E[X]=\int X dP(X,Y)=\mu_x\\

E[Y]=\int Y dP(X,Y)=\mu_y\\

E[X^2]=\int X^2 dP(X,Y)=\mu_x+\sigma_x^2\\

E[Y^2]=\int Y^2 dP(X,Y)=\mu_y+\sigma_y^2\\

\end{array}

$$

This means she aims to find the maximum value of the expectation of $(X-\mu_x)(Y-\mu_y) $ among all the probability distributions that are compatible with her beliefs on $X,Y$ (the knowledge of the means and variances). She can similarly compute the minimum. This is the essence of Imprecise Probability.

To compute that, we first observe that

$$

\begin{bmatrix}

X-\mu_x\\

Y-\mu_y

\end{bmatrix}\begin{bmatrix}

X-\mu_x & Y-\mu_y\\

\end{bmatrix}=

\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\geq 0

$$

where $\geq$ means that the matrix is positive semi-definite. We know that the expectation operator preserves the sign and, therefore,

$$

E\left(

\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\right)\geq 0

$$

but, because of linearity of expectation, we have that

$$

\begin{aligned}

&E\left(\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\right)\\ &~\\

&=

\begin{bmatrix}

E[(X-\mu_x)^2] & E[(X-\mu_x)(Y-\mu_y)]\\

E[(Y-\mu_y)(X-\mu_x)] & E[(Y-\mu_y)^2]\\

\end{bmatrix}\geq 0.

\end{aligned}

$$

This matrix is positive semi-definite provided that

$$

E[(X-\mu_x)(Y-\mu_y)]^2\leq E[(X-\mu_x)^2] E[(Y-\mu_y)^2]

$$

which is exactly

$$Cov(X,Y)^2\leq Var(X)Var(Y)$$.

]]>Quantum mechanics: The Bayesian theory generalized to the space of Hermitian matrices. In: Physics Review A, 94 , pp. 042106, 2016.

I had a question from the audience about whether/how we can derive “Heisenberg inequality” as a consequence of our subjective (gambling) formulation of QM. This is not complicated since Heisenberg inequality is just the QM version of Covariance Inequality which states that for any two random variables $X$ and $Y$

$$Cov(X,Y)^2\leq Var(X)Var(Y)$$

Therefore, before deriving Heisenberg uncertainty, I will show how to derive the above inequality from a Bayesian (Imprecise probability) perspective.

To explain the inequality from a subjective point of view, we introduce our favorite subject, Alice. Let us assume that there two real variables $X,Y$ and that Alice only knows their first $\mu_x=E(X),\mu_y=E(Y)$ and second $E(X^2),E(Y^2)$ moments (in other words she only knows their means and variances, since $Var(Z)=E(Z^2)-E(Z)^2=\sigma_z^2$).

Assume Alice wants to compute $Cov(X,Y)$.

Since Alice does not know the joint probability distribution $P(X,Y)$ of $X,Y$ (she only knows the first two moments), she cannot compute $Cov(X,Y)$. However, she can compute bounds for $Cov(X,Y)$, or in other words, she can aim to solve the following problem

$$

\begin{array}{l}

~\max_{P} \int (X-\mu_x)(Y-\mu_y) dP(X,Y)\\

\int X dP(X,Y)=\mu_x\\

\int Y dP(X,Y)=\mu_y\\

\int X^2 dP(X,Y)=\mu_x+\sigma_x^2\\

\int Y^2 dP(X,Y)=\mu_y+\sigma_y^2\\

\end{array}

$$

This means she aims to find the maximum value of the expectation of $(X-\mu_x)(Y-\mu_y) $ among all the probability distributions that are compatible with her beliefs on $X,Y$ (the knowledge of the means and variances). She can similarly compute the minimum. This is the essence of Imprecise Probability.

To compute these bounds she first rewrites the above problem as

\begin{equation}

\label{eq:1}

\begin{array}{l}

\text{opt}_{P} C=\int (X-\mu_x)^2 +a^2(Y-\mu_y)^2 -2a (X-\mu_x)(Y-\mu_y) dP(X,Y)\\

\int X dP(X,Y)=\mu_x\\

\int Y dP(X,Y)=\mu_y\\

\int X^2 dP(X,Y)=\mu_x+\sigma_x^2\\

\int Y^2 dP(X,Y)=\mu_y+\sigma_y^2\\

\end{array}

\end{equation}

where $a$ is some scalar. Note that since $\int (X-\mu_x)^2dP(X,Y)$ and $\int (Y-\mu_y)^2dP(X,Y)$ are known (they are respectively equal to $\sigma_x^2$ and $\sigma_y^2$), adding these terms “does not change” the optimization problem (the $P$ that achieves the maximum is the same — they are just additive constants). If we assume that $a$ is positive, then the same is true for the $a$ that multiplies $(X-\mu_x)(Y-\mu_y)$ provided that $opt=\min$ (otherwise, this is true provided that $opt=\max$).

Now observe that

$$

C=(X-\mu_x)^2 +a^2(Y-\mu_y)^2 -2a (X-\mu_x)(Y-\mu_y) =[X-\mu_x-a(Y-\mu_y)]^2

$$

and, therefore, we can conclude that $\int (X-\mu_x)^2 +a^2(Y-\mu_y)^2 -2a (X-\mu_x)(Y-\mu_y) dP(X,Y)$ is always non-negative for every $a,~P$.

She has to solve the above constrained optimization problem. For the moment let us forget the constraints.

Let us assume that we can take $a$ as a function of $P$, then the unconstrained maximum can be obtained by computing the derivative of the objective function w.r.t. $a$ and solving

$$

\frac{d}{da}C=\int 2a(Y-\mu_y)^2 -2 (X-\mu_x)(Y-\mu_y) dP(X,Y)=0

$$

whose solution is $a=\frac{E[(X-\mu_x)(Y-\mu_y)]}{E[(Y-\mu_y)^2]}=\frac{Cov(X,Y)}{\sigma_y^2}$. Since the second derivative of $C$ is non-negative, this is a minimum.

If we choose $a$ in this way then we have that

\begin{equation}

\begin{array}{rcl}

0&\leq& \int (X-\mu_x)^2 + \Big(\frac{Cov(X,Y)}{\sigma_y^2}\Big)^2(Y-\mu_y)^2 +2\Big(\frac{Cov(X,Y)}{\sigma_y^2}\Big) (X-\mu_x)(Y-\mu_y) dP(X,Y)\\

&=&\sigma_x^2+\frac{Cov(X,Y)^2}{\sigma_y^2}-2\frac{Cov(X,Y)^2}{\sigma_y^2}

\end{array}

\end{equation}

Since we allowed $a$ to depend on $P$, we cannot find a better minimum.

Hence, we can derive that

$$

Cov(X,Y)^2\leq \sigma_x^2\sigma_y^2

$$

Note that to obtain the above inequality we have chosen a $P(X,Y)$ that satisfies $\int (X-\mu_x)^2 dP(X,Y)=\sigma_x^2$

and $\int (Y-\mu_y)^2 dP(X,Y)=\sigma_y^2$ and, therefore, it satisfies the constraints. This ends the proof.

Here you can find the link to my Keynote talk:

]]>error. The motivations for the axioms are not always clear and even to experts the basic axioms of QM often

appear counter-intuitive. In a recent paper [1], we have shown that:

- It is possible to derive quantum mechanics from a single principle of self-consistency or, in other

words, that QM laws of Nature are logically consistent; - QM is just the Bayesian theory generalised to the complex Hilbert space.

To obtain these results we have generalised the theory of desirable gambles (TDG) to complex numbers.

TDG was originally introduced by Williams, and later reconsidered by Walley, to justify in a subjective way a

very general form of probability theory.

[**Theory of desirable gambles**]

In classical subjective, or Bayesian, probability, there is a well-established way to check whether the

probability assignments of a certain subject, whom we call Alice, about the result of an uncertain experiment is

valid, in the sense that they are self-consistent. The idea is to use these probability assignments to define

odds—the inverses of probabilities—about the results of the experiment (e.g., Head or Tail in the case of a coin

toss); and then show that there is no way to make Alice a sure loser in the related betting system, that is, to make

her lose money no matter the outcome of the experiment. Historically this is also referred to as the

impossibility to make a Dutch book or that the assessments are coherent; and Alice in these conditions is

regarded as a rational subject. De Finetti [3] showed that Kolmogorov’s probability axioms can

be derived by imposing the principle of coherence alone on a subject’s odds about an uncertain

experiment.

Williams and Walley [8, 7] have later shown that it is possible to justify probability in a simpler and more

elegant way. Their approach is also more general than de Finetti’s, because coherence is defined

purely as logical consistency without any explicit reference to probability (which is also what allows

coherence to be generalised to other domains, such as quantum mechanics); the idea is to work in the

dual space of gambles. To understand this framework, we consider an experiment whose outcome

ω belongs to a certain space of possibilities Ω (e.g., Head or Tail). We can model Alice’s beliefs

about ω by asking her whether *she accepts engaging in certain risky transaction*s, called **gambles**,

whose outcome depends on the actual outcome of the experiment. Mathematically, a gamble is a

bounded real-valued function on Ω, g : Ω →ℝ, which is interpreted as an uncertain reward in a

linear utility scale. If Alice accepts a gamble g, this means that she commits herself to receive g(ω) utiles (euros)

if the outcome of the experiment eventually happens to be the event ω ∈Ω. Since g(ω) can be negative, Alice can

also lose utiles. Therefore Alice’s acceptability of a gamble depends on her knowledge about the

experiment.

The set of gambles that Alice accepts—let us denote it by K —is called her set of desirable gambles. We say

that a gamble g is positive if g≠0 and g(ω) ≥0 for each ω ∈Ω. We say that g is negative if g≠0

and g(ω) ≤0 for each ω ∈Ω. K is said to be coherent when it satisfies the following minimal

requirements:^{2}

- D1
- Any positive gamble g must be desirable for Alice (g∈K), given that it may increase Alice’s capital

without ever decreasing it (accepting partial gain). - D2
- Any negative gamble g must not be desirable for Alice (g∈K), given that it may only decrease

Alice’s capital without ever increasing it (avoiding partial loss). - D3
- If Alice finds g and h to be desirable (g,h ∈K ), then also λg+ νh must be desirable for her

(λg+νh ∈K), for any 0 < λ,ν ∈ℝ (linearity of the utility scale).

In spite of their simple character, these axioms alone define a very general theory of probability. *De Finetti’s*

* (Bayesian) theory is the particular case obtained by additionally imposing some regularity (continuity)*

* requirement and especially completeness, that is, the idea that a subject should always be capable of*

comparing options [7, 8].In this case, probability is derived from K via (mathematical) **duality**.

[**QM**]

In [1] we have extended desirability to QM. To introduce this extension, we first have to define what is a

gamble in a quantum experiment and how the payoff for the gamble is computed. To this end, we consider an

experiment relative to an n-dimensional quantum system and two subjects: the gambler (Alice) and the

bookmaker. The** n-dimensional quantum system** is prepared by the bookmaker in some quantum state. We

assume that Alice has her personal knowledge about the experiment (possibly no knowledge at

all).

- 1.
- The bookmaker announces that he will measure the quantum system along its n orthogonal

directions and so the outcome of the measurement is an element of Ω = {ω_{1},…,ω_{n}}, with ω_{i}

denoting the elementary event “detection along i”. Mathematically, it means that the quantum

system is measured along its eigenvectors, i.e., the projectors and ω_{i}is the event “indicated” by

the i-th projector. - 2.
- Before the experiment, Alice declares the set of gambles she is willing to accept. Mathematically,

a gamble G on this experiment is a n×n Hermitian matrix in ℂ; the space of all Hermitian n×n

matrices is denoted by ℂ_{h}^{n×n}. - 3.
- By accepting a gamble G, Alice commits herself to receive γ
_{i}∈ℝ utiles if the outcome of the

experiment eventually happens to be ω_{i}. The value γ_{i}is defined from G and Π^{*}as follows

Π_{i}^{*}GΠ_{i}^{*}= γ_{i}Π_{i}^{*}for i = 1,…,n. It is a real number since G is Hermitian.

The subset of all positive semi-definite and non-zero (PSDNZ) matrices in ℂ_{h}^{n×n} constitutes the set of positive

gambles, whereas the set of negative gambles is similarly given by all gambles G ∈ℂ_{h}^{n×n}

h such that G ≩ 0. Alice examines the gambles in **ℂ _{h}^{n×n}** and comes up with the subset K of the gambles that she finds desirable. Alice’s

rationality is then characterised by simply applying the

space of hermitian matrices:

- S1
- Any PSDNZ (positive gamble) G must be desirable for Alice (G ∈K), given that it may increase

Alice’s utiles without ever decreasing them (accepting partial gain). - S2
- Any G ≩ 0 (negative gamble) must not be desirable for Alice (G in K ), given that it may only

decrease Alice’s utiles without ever increasing them (avoiding partial loss). - S3
- If Alice finds G and H desirable (G,H ∈K), then also λG + νH must be desirable for her

(λG+νH ∈K), for any 0 < λ,ν ∈ℝ (linearity of the utility scale).

From a geometric point of view, a coherent set of desirable gambles K is a convex cone without its apex and that

contains all PSDNZ matrices (and thus it is disjoint from the set of all matrices such that G ≩ 0). We may also

assume that K satisfies the following additional property:

- S4
- if G ∈K then either G ≰ 0 or G–εI ∈K for some strictly positive real number ε (openness).

This property is not necessary for rationality, but it is technically convenient as it precisely isolates the kind of

models we use in QM (as well as in classical probability) [1]. The openness condition (S4) has a gambling

interpretation too: it means that we will only consider gambles that are strictly desirable for Alice; these are the

gambles for which Alice expects gaining something—even an epsilon of utiles. For this reason, K is called set

of strictly desirable gambles (SDG) in this case.

An SDG is said to be maximal if there is no larger SDG containing it. In [1, Theorem IV.4], we have shown

that maximal SDGs and density matrices are one-to-one. The mapping between them is obtained through the

standard inner product in** ℂ _{h}^{n×n}** , i.e., G⋅R = Tr(G

h via a representation theorem [1, TheoremIV.4].

This result has several consequences. First, it provides a gambling interpretation of the first axiom of QM on

density operators. Second, it shows that density operators are coherent, since the dual of ρ is a valid

SDG. This also implies that QM is self-consistent—a gambler that uses QM to place bets on a

quantum experiment cannot be made a partial (and, thus, sure) loser. Third, the first axiom of QM

on ℂ_{h}^{n×n} is structurally and formally equivalent to Kolmogorov’s first and second axioms about

probabilities on ℝ^{n} [1, Sec. 2]. In fact, they can be both derived via duality from a coherent set of desirable

gambles on ℂ_{h}^{n×n} and, respectively, ℝ^{n}. In [1] we have also derived Born’s rule and the other three

axioms of QM as **a consequence of rational gambling on a quantum experiment ****and show that that**

** measurement, partial tracing and tensor product are equivalent to the probabilistic notions of**

** Bayes’ rule, marginalisation and independence. **Finally, as an additional consequence of the

aforementioned representation result, in [2] we have shown that a subject who uses dispersion-free

probabilities to accept gambles on a quantum experiment can always be made a sure loser: she

loses utiles no matter the outcome of the experiment. We say that dispersion-free probabilities are

incoherent, which means that they are logically inconsistent with the axioms of QM. Moreover, we have

proved that it is possible to derive a stronger version of Gleason’s theorem that holds in any finite dimension (hence even for n = 2),

through much a simpler proof, which states that all coherent

probability assignments in QM must be obtained as the trace of the product of a projector and a density

operator.

A list of relevant bibliographic references, as well as a comparison between our approach and similar

approaches like QBism [4] and Pitowsky’s quantum gambles [6], can be found in [1].

[1] Alessio Benavoli, Alessandro Facchini & Marco Zaffalon (2016): Quantum mechanics: The Bayesian

theory generalized to the space of Hermitian matrices. Phys. Rev. A 94, p. 042106. Available at

https://arxiv.org/pdf/1605.08177.pdf.

[2] Alessio

Benavoli, Alessandro Facchini & Marco Zaffalon (2017): A Gleason-Type Theorem for Any Dimension Based

on a Gambling Formulation of Quantum Mechanics. Foundations of Physics, pp. 1–12, doi:10.H.B1007/H.B

s10701-_017-_0097-_0H.B. Available at https://arxiv.org/pdf/1606.03615.pdf.

[3] B. de Finetti (1937): La prévision: ses lois logiques, ses sources subjectives. Annales de l’Institut Henri

Poincaré 7, pp. 1–68. English translation in [5].

[4] Christopher A Fuchs & Ruediger Schack (2013): Quantum-Bayesian coherence. Reviews of Modern

Physics 85(4), p. 1693.

[5] H. E. Kyburg Jr. & H. E. Smokler, editors (1964): Studies in Subjective Probability. Wiley, New York.

Second edition (with new material) 1980.

[6] Itamar Pitowsky (2003): Betting on the outcomes of measurements: a Bayesian theory of quantum

probability. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern

Physics 34(3), pp. 395–414.

[7] P. Walley (1991): Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, New York.

[8] P. M. Williams (1975): Notes on conditional previsions. Technical Report, School of Mathematical and

Physical Science, University of Sussex, UK.

In particular, the Bayesian correlated t-test makes inference about the mean difference of accuracy between two classifiers in the $i$-th dataset ($\mu_i$) by exploiting three pieces of information: the sample mean ($\bar{x}_i$), the variability of the data (sample standard deviation $\hat{\sigma}_i$) and the correlation due to the overlapping training set ($\rho$). This test can only be applied to a single dataset.

There is no direct NHST able to extend the above statistical comparison to multiple datasets, i.e., that takes as inputs the $m$ runs of the $k$-fold cross-validation results for each dataset and returns as output a statistical decision

about which classifier is better in all the datasets.

The usual NHST procedure that is employed for performing such analysis has two steps:

(1) compute the mean \textit{difference} of accuracy for each dataset $\bar{x}_i$;

(2) perform a NHST to establish if the two classifiers have different performance or not based on these mean differences of accuracy.

This discards two pieces of information: the correlation $\rho$ and sample standard deviation $\hat{\sigma}_i$ in each dataset.

The standard deviation is informative about the accuracy of $\bar{x}_i$ as an estimator of $\mu_i$.

The standard deviation can largely vary across data sets, as a result of each data set having its own size and complexity.

The aim of this section is to present an extension of the Bayesian correlated t-test that is able

to make inference on multiple datasets and at the same time to account for all the available information

(mean, standard deviation and correlation).

In Bayesian estimation, this can be obtained by defining a hierarchical model. Hierarchical models are among the most powerful and flexible tools in Bayesian analysis.

The code, this notebook and the dataset can be downloaded from our github repository.

Module `hierarchical`

in `bayesiantests`

compares the performance of two classifiers that have been assessed by *m*-runs of *k*-fold cross-validation on *q* datasets. It returns probabilities that, based on the measured performance, one model is better than another or vice versa or they are within the region of practical equivalence.

This notebook demonstrates the use of the module.

In [1]:

```
import numpy as np
scores = np.loadtxt('Data/diffNbcHnb.csv', delimiter=',')
names = ("HNB", "NBC")
print(scores)
```

To analyse this data, we will use the function hierarchical in the module bayesiantests that accepts the following arguments.

```
scores: a 2-d array of differences.
rope: the region of practical equivalence. We consider two classifiers equivalent if the difference in their
performance is smaller than rope.
rho: correlation due to cross-validation
names: the names of the two classifiers; if x is a vector of differences, positive values mean that the second
(right) model had a higher score.
```

The hierarchical function uses **STAN** through Python module **pystan**.

Function `hierarchical(scores,rope,rho, verbose, names=names)`

computes the Bayesian hierarchical test and returns the probabilities that the difference (the score of the first classifier minus the score of the first) is negative, within rope or positive.

In [8]:

```
import bayesiantests as bt
rope=0.01 #we consider two classifers equivalent when the difference of accuracy is less that 1%
rho=1/10 #we are performing 10 folds, 10 runs cross-validation
pleft, prope, pright=bt.hierarchical(scores,rope,rho)
```

The first value (`left`

) is the probability that the the differences of accuracies is negative (and, therefore, in favor of HNB). The third value (`right`

) is the probability that the the differences of accuracies are positive (and, therefore, in favor of NBC). The second is the probability of the two classifiers to be practically equivalent, i.e., the difference within the rope.

In the above case, the HNB performs better than naive Bayes with a probability of 0.9965, and they are practically equivalent with a probability of 0.002. Therefore, we can conclude with high probability that HNB is better than NBC.

If we add arguments `verbose`

and `names`

, the function also prints out the probabilities.

In [9]:

```
pl, pe, pr=bt.hierarchical(scores,rope,rho, verbose=True, names=names)
```

The posterior distribution can be plotted out:

- using the function
`hierarchical_MC(scores,rope,rho, names=names)`

we generate the samples of the posterior - using the function
`plot_posterior(samples,names=('C1', 'C2'))`

we then plot the posterior in the probability simplex

In [10]:

```
%matplotlib inline
import matplotlib.pyplot as plt
samples=bt.hierarchical_MC(scores,rope,rho, names=names)
#plt.rcParams['figure.facecolor'] = 'black'
fig = bt.plot_posterior(samples,names)
plt.savefig('triangle_hierarchical.png',facecolor="black")
plt.show()
```

It can be seen that the posterior mass is in the region in favor of HNB and so it confirms that the classifier is better than NBC. From the posterior we have also an idea of the magnitude of the uncertainty and the “stability” of our inference.

To functions `hierarchical`

allow also to test the efect of the prior hyperparameters. We point to the last reference for a discussion about prior sensitivity.

`@ARTICLE{bayesiantests2016,`

author = {{Benavoli}, A. and {Corani}, G. and {Demsar}, J. and {Zaffalon}, M.},

title = "{Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis}",

journal = {ArXiv e-prints},

archivePrefix = "arXiv",

eprint = {1606.04316},

url={https://arxiv.org/abs/1606.04316},

year = 2016,

month = jun

}

`@article{corani2016unpub,`

title = { Statistical comparison of classifiers through Bayesian hierarchical modelling},

author = {Corani, Giorgio and Benavoli, Alessio and Demsar, Janez and Mangili, Francesca and Zaffalon, Marco},

url = {http://ipg.idsia.ch/preprints/corani2016b.pdf},

year = {2016},

date = {2016-01-01},

institution = {technical report IDSIA},

keywords = {},

pubstate = {published},

tppubtype = {article}

}