In a previous post, we have seen how to perform polls for a single State using poll data from

KTNV/Rasmussen.

Here we are going to see how to combine polls from different sources.

Let us consider again Nevada polls.

Poll | Date | Sample | MoE | Clinton (D) | Trump (R) | Johnson (L) | Spread | |
---|---|---|---|---|---|---|---|---|

0 | RCP Average | 7/7 – 8/5 | — | — | 43 | 40.7 | 6.3 | Clinton +2.3 |

1 | CBS News/YouGov* | 8/2 – 8/5 | 993 LV | 4.6 | 43 | 41.0 | 4.0 | Clinton +2 |

2 | KTNV/Rasmussen | 7/29 – 7/31 | 750 LV | 4.0 | 41 | 40.0 | 10.0 | Clinton +1 |

3 | Monmouth | 7/7 – 7/10 | 408 LV | 4.9 | 45 | 41.0 | 5.0 | Clinton +4 |

Instead of doing an average of the poll as it is done by RCP (RealClearPolitics), we use Covariance Intersection. **Covariance intersection** is an algorithm for combining two or more data source when the correlation between them is unknown.

Let us denote with (\hat{a}) a vector of observations (e.g., 43,41,16 from CBS News/YouGov) and (\hat{b}) another vector of observations (e.g., 41,40,19 from KTNV/Rasmussen). (A) denotes the reliability of the data poll (\hat{a}) that we assume to be equal (1/sample size) (e.g., 1/993

for CBS News/YouGov) and (B) denotes the reliability of the data poll (\hat{b}) (e.g., 1/750 for KTNV/Rasmussen).

Given the weight (omega),Covariance Intersection provides a formula to combine them:

$$

C^{{-1}}=\omega A^{{-1}}+(1-\omega )B^{{-1}},

$$

$$

\hat{c} =C(\omega A^{{-1}}\hat{a}+(1-\omega )B^{{-1}}\hat{b}),.

$$

This formula can be extended to an arbitrary number of sources. For instance, for the previous table using uniform weights (\omega_1=1/3,\omega_3=1/3,\omega_3=1/3), we get

$$

C^{{-1}}=\omega_1 993+\omega_2 750+\omega_3 408=717$$

$$

\hat{c} =C(\omega_1 993{[43,41,16]}+\omega_2 750[41,40,19]+\omega_3 408[45,41,14]),.

$$

The final result is

$$

C^{{-1}}=717, ~~~\hat{c}=[42.68, 40.65, 16.67]

$$

It can be observed that by using (omega_1=1/3,omega_3=1/3,omega_3=1/3) the combined poll (hat{c}) reduces to the average of the input polls

weighted by the sample size. However, it is possible to choose other values of the weights, see for instance here.

Pingback: Bayesian winning lower and upper probabilities in all 51 States – Alessio Benavoli