Imagine for a moment that you have been gifted 100 expensive wine bottles by 100 different people. But there’s a catch: you discover that one of the bottles has been poisoned, and you have no idea which one. Fortunately, you have a test that can detect poison very accurately, even among a large volume of liquid. Unfortunately, you have only 7 of these tests available. What should you do?
A simple answer would be to just throw away the wine. With 100 bottles and just 7 tests, surely there’s no way you’ll identify the poisoned one. Or is there?
Remember, the test works well even with large volumes of liquid. So what we need is a method to convert 100 pieces of information (i.e. the list of bottles) into a format that can be captured by 7 units of information (i.e. the 7 positive or negative tests) and still uniquely identify each of the original bottles.
Reshaping information
Let’s start with a simpler example to explore how this could work. Say we have 4 bottles and 2 tests. Now suppose we mix up two random batches of wine, by combining wine from various bottles. Because each test can be positive or negative, there are 4 possible combinations of test result that we could in theory see if we test the two batches:
Both negative
1st test positive and 2nd negative
1st negative and 2nd positive
Both positive
Given our scattergun mixing into two batches can give 4 possibilities for the test results, can we be a bit more systematic, to get each possibility to line up with one of the four bottles?
For example, we could add wine (or not) from each of the bottles into one of two containers as follows:
Container 1: Container 2:
1st bottle: NO NO
2nd bottle: NO YES
3rd bottle: YES NO
4th bottle: YES YES
In other words, we don’t put any wine from the 1st bottle in either container, we put wine from the 2nd bottle in container 1 but not 2, then vice versa for the 3rd bottle, and finally we put wine from the 4th bottle in both.
Because we’ve arranged things so that each possible test combination lines up with one specific bottle, the results of the tests will tell us which bottle is poisoned. If no tests are positive, it’s the 1st bottle; if both are positive it’s the 4th bottle etc.
You may have noticed (or remembered from seeing another version of this riddle) that the YES/NO decisions above can also be written as binary numbers (i.e. 00, 01, 10, 11). This gives us a natural way to ‘encode’ each wine bottle so it matches a specific set of test results. And because each test can have one of two possible results, with N tests, we can encode 2N different numbers (or bottles of wine).
So with 7 tests, we can encode up to 27=128 bottles of wine, by converting the number of the bottle into binary. For example, bottle number 25 would be converted to 0011001. Hence once we line up the 7 containers, we’d therefore add wine from the 25th bottle to container 3, 4 and 7 only.
Efficiency and epidemiology
0011001 is not a particularly neat way of expressing a number in everyday life. Using only two digits, 25 is much pithier. But as a number, 25 is written in base 10 (i.e. each position can take any digit from 0 to 9), whereas our tests can only return 2 possible results. So we need to convert our calculations to base 2 (i.e. binary) to make the best use of the resources we have available.
This approach isn’t to fictional wine bottles. When studying infectious diseases, we often have situations where we want to understand what’s happening, but don’t necessarily have the resources to test large numbers of people. ‘Pooled testing’ can therefore be a useful way to help testing capacity go further. For example, my collaborators at Institut Louis Malardé in French Polynesia used pooled testing for SARS-CoV-2 to identify which arriving traveller groups might be infected in 2020-21, with any positive pools followed up afterwards.
Such testing isn’t limited to humans. With infections like dengue increasing their range, including local transmission in Paris last year, a crucial question is how much virus might be circulating in mosquito populations. But like wine bottles, it’s not feasible to test large numbers of mosquitoes on the off chance of catching a positive one. So instead, researchers use techniques like xenomonitoring, capturing large numbers of mosquitoes then testing the samples in bulk to estimate how common the pathogen is.
Unfortunately, real-life data can be more complicated than doing a simple conversation into binary: tests are often imperfect, and we might have multiple human or mosquito groups infected. But fortunately, there is a growing number of statistical tools to help make sense of the results. By reshaping information, researchers can get better insights into emerging threats like dengue. And with it, a better sense of the risk that will come with a sunset glass of wine in future.
that's a really neat solution; my own was to do a binary search (split the bottles into two equally sized groups; test one pooled sample at random; if it's positive, that's the group, if negative, it's the other group; repeat) which will also ID the poison bottle out of 100 in 7 rounds providing you know there's one and one only.
But as far as I can see these solutions don't take into account key context. 1) opened bottles of wine will spoil and 2) all the wine was gifted so there is no material loss in losing most of them
If we test 7 random bottles of wine then worst case we don't find a poisoned bottle so have to throw 93 bottles away - but we do have 7 bottles of safe fancy wine to drink (quickly, admittedly, before they spoil)
Best case we find the poisoned one among the 7 and have 99 bottles of fancy wine to enjoy at our leisure
The downside to this is that in the first, most likely, case we don't identify the poisoner, but is that our problem? At least we have 7 more bottles of lovely wine than we had before all this started