In a previous post we derived the Covariance Inequality from a Bayesian (Imprecise probability) perspective.

There is another and more elegant way to derive this inequality:

$$Cov(X,Y)^2\leq Var(X)Var(Y)$$

To do that, we introduce again our favorite subject, Alice. Let us summarize the problem again. Assume that there two real variables $X,Y$ and that Alice only knows their first $\mu_x=E(X),\mu_y=E(Y)$ and second $E(X^2),E(Y^2)$ moments (in other words she only knows their means and variances, since $Var(Z)=E(Z^2)-E(Z)^2=\sigma_z^2$).

Alice wants to compute $Cov(X,Y)$.

Since Alice does not know the joint probability distribution $P(X,Y)$ of $X,Y$ (she only knows the first two moments), she cannot compute $Cov(X,Y)$. However, she can compute bounds for $Cov(X,Y)$, or in other words, she can aim to solve the following problem

$$

\begin{array}{l}

~\max_{P} E[(X-\mu_x)(Y-\mu_y) ]=\int (X-\mu_x)(Y-\mu_y) dP(X,Y)\\

E[X]=\int X dP(X,Y)=\mu_x\\

E[Y]=\int Y dP(X,Y)=\mu_y\\

E[X^2]=\int X^2 dP(X,Y)=\mu_x+\sigma_x^2\\

E[Y^2]=\int Y^2 dP(X,Y)=\mu_y+\sigma_y^2\\

\end{array}

$$

This means she aims to find the maximum value of the expectation of $(X-\mu_x)(Y-\mu_y) $ among all the probability distributions that are compatible with her beliefs on $X,Y$ (the knowledge of the means and variances). She can similarly compute the minimum. This is the essence of Imprecise Probability.

To compute that, we first observe that

$$

\begin{bmatrix}

X-\mu_x\\

Y-\mu_y

\end{bmatrix}\begin{bmatrix}

X-\mu_x & Y-\mu_y\\

\end{bmatrix}=

\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\geq 0

$$

where $\geq$ means that the matrix is positive semi-definite. We know that the expectation operator preserves the sign and, therefore,

$$

E\left(

\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\right)\geq 0

$$

but, because of linearity of expectation, we have that

$$

\begin{aligned}

&E\left(\begin{bmatrix}

(X-\mu_x)^2 & (X-\mu_x)(Y-\mu_y)\\

(Y-\mu_y)(X-\mu_x) & (Y-\mu_y)^2\\

\end{bmatrix}\right)\\ &~\\

&=

\begin{bmatrix}

E[(X-\mu_x)^2] & E[(X-\mu_x)(Y-\mu_y)]\\

E[(Y-\mu_y)(X-\mu_x)] & E[(Y-\mu_y)^2]\\

\end{bmatrix}\geq 0.

\end{aligned}

$$

This matrix is positive semi-definite provided that

$$

E[(X-\mu_x)(Y-\mu_y)]^2\leq E[(X-\mu_x)^2] E[(Y-\mu_y)^2]

$$

which is exactly

$$Cov(X,Y)^2\leq Var(X)Var(Y)$$.