SIMPSON’S PARADOX

 

Simpson’s Paradox is a statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations. For instance, two variables may be positively associated in a population, but be independent or even negatively associated in all subpopulations. Cases exhibiting the paradox are unproblematic from the perspective of mathematics and probability theory, but nevertheless strike many people as surprising. Additionally, the paradox has implications for a range of areas that rely on probabilities, including decision theory, causal inference, and evolutionary biology. Finally, there are many instances of the paradox, including in epidemiology and in studies of discrimination, where understanding the paradox is essential for drawing the correct conclusions from the data.

 

For example, you and a friend each do problems on Brilliant, and your friend answers a higher proportion correctly than you on each of two days. Does that mean your friend has answered a higher proportion correctly than you when the two days are combined? Not necessarily!

 

  • On Saturday, you solved 77 out of 88 attempted problems, but your friend solved 22 out of 2.2. You had solved more problems, but your friend pointed out that he was more accurate, since 78<2287<22. Fair enough.
  • On Sunday, you only attempted 22 problems and got 11 correct. Your friend got 55 out of 88 problems correct. Your friend gloated once again, since 12<5821<85.

However, the competition is about the one who solved more accurately over the weekend, not on individual days. Overall, you have solved 88 out of 1010 problems whereas your friend has solved 77 out of 1010 problems. Thus, despite your friend solving a higher proportion of problems on each day, you actually won the challenge by solving the higher proportion for the entire weekend! While your friend got furious, you calmly pointed him to this page: you had just shown an instance of Simpson's paradox.