The popularity of the average
Or why ‘personalised’ health companies sell everyone the same branded product
One of the quirks of population health research is that it often isn’t that interested in you. Nor is it that interested in me. Or any other specific person for that matter.
Instead, many studies focus on average effects. If we run a randomised controlled trial for a new drug, we are comparing the mean difference in a health outcome between the control and treatment group. If we track patients with a certain disease over time and calculate a 10% fatality risk in the over 80 age group, this represents the average risk across that population. It’s not a tailored estimate for a specific individual; depending on other as-yet-unknown health and genetic factors, some over 80s might have a risk much larger than 10%, while others might have a much smaller risk.
Although populations are made up of individuals, predicting the health of a specific person is a very different problem to understanding what influences the average health of a population. In his classic 1985 paper ‘Sick individuals and sick populations’, Geoffrey Rose makes the point that even if we can’t say much about an individual, we can often discern patterns at the population level:
Within populations it has proved almost impossible to demonstrate any relation between an individual’s diet and his serum cholesterol level; and the same applies to the relation of individual diet to blood pressure and to overweight. But at the level of populations it is a different story: it has proved easy to show strong associations between population mean values for saturated fat intake versus serum cholesterol level and coronary heart disease incidence, sodium intake versus blood pressure, or energy intake versus overweight.
Why so average?
One of the reasons researchers focus on average outcomes is that they are often much easier to estimate using available statistical methods. Even if individual health trajectories vary a lot, a well-designed study – like a randomised controlled trial – should be able to detect whether there is on average a difference between two groups.
This distinction between individual outcomes and population averages can create challenges in interpretation. Last year, Sam Zhang and colleagues ran a study to understand how people perceive such estimates. They used violent video games as an example. Suppose you ran a study and observed the following (hypothetical) difference in average aggressiveness scores among 100 players of violent and non-violent games. The lines represent the standard error, i.e. a measure of how confident we are that the mean takes that particular value:
At first glance, there seems to a difference between the two, with a slightly larger value for players of violence games – albeit with quite a lot of uncertainty about the exact number. If we were to run a larger study, say with 800 people, we would expect this uncertainty to decrease. In other words, we could end up more confident that players of violent games on average have a higher aggressiveness score:
Understanding average differences can be useful, particularly if we want to understand what overall effect an intervention might have at the population level. (Assuming that we have sufficient evidence that we are indeed dealing with a cause-and-effect relationship.) But it’s not the full story.
Let’s talk about you
Averages are one thing, but what does this all mean for individuals? If someone plays a violent game, what will their aggressiveness score be? To answer this question, we don’t just need to know how confident we can be in the average difference. We need to understand how much variability there is among the individual-level outcomes.
The following graphic from the Zhang et al paper illustrates the problem. If we collect more samples, it can give us more confidence in our ability to infer the average difference between two groups (known as ‘inferential uncertainty’). But it doesn’t necessarily help us get better at predicting outcomes between individuals in those two groups (i.e. ‘outcome variability’).
There’s quite a lot going on in the below figure, so it’s worth taking a moment to get orientated. The left hand column shows the uncertainty in average estimates, while the right hand column shows variability in individual level outcomes. The top row shows a sample size of 100 and the bottom row a sample of 800.
The text along the edges summarises the situation: more data can decrease uncertainty in our estimates of the average difference, but does not decrease variability in individual-level outcomes. As a result, one specific person who plays violent games could have a much lower aggressiveness score than another person who plays non-violent games, even if on average we’d expect the relationship to be the other way around.
Unfortunately, the researchers found that readers would often confuse the shrinking error bars in estimates of the population average for an increased ability to predict individual outcomes. This happened even among some experts. As the researchers note:
the pervasive focus on inferential uncertainty in scientific data visualizations can mislead even experts about the size and importance of scientific findings, leaving them with the impression that effects are larger than they actually are.
The prevention paradox
Geoffrey Rose would encourage medical students learning epidemiology to ask the question: ‘Why did this patient get this disease at this time?’ Similarly, if we’re interested in individual outcomes, or personalised treatments, we need to narrow down the causes that drive the variability we see. In an ideal world, you’d be able to have high confidence that a particular combination of treatments will work for you, not merely that out of 100 people of roughly similar age and health background, a small proportion will see an improvement.
For a common disease, even a single digit proportion can be equivalent to thousands of people with improved outcomes at the population level. But it still means there is only a small chance that you specifically will fall in this group. Rose called it the prevention paradox: ‘A preventive measure which brings much benefit to the population offers little to each participating individual’.
Moving beyond averages is a hard problem. Even if a highly personalised cause-and-effect can be disentangled in theory – and hence better predictions made – it will generally require a lot of data. And for certain health problems where randomness dominates, this may not even be possible in theory.
Which brings us on to personalised nutrition and health. Increasingly there is a focus on digital tools that provide tailored individual diets and plans. But I’ve increasingly noticed these companies also market branded foods and supplements alongside these tools. It feels like a contradiction: on one hand, we have the premise that your health requires highly tailored inputs; on the other, there is the implicit suggestion that everyone will get the same benefit from the same product.
The reason, I suspect, is that a single product is easier, both from a statistical and business point of view. But this approach still suffers from the confusion described above. Confidence in average population health benefits is communicated as confidence in individual-level consumer benefits. The irony is that the more a ‘personalised’ health company tries and get you interested in their one-size-fits all food or supplement, the more it shows that they aren’t really that interested in your health at all.
Cover image source: Matemateca via WikiCommons
Adam, as always you provide us with your deep,thought. I was wondering what would you suggest on how we should report on RCTs? Thanks as always!