The shape of information

How to extract insights with seemingly limited resources

Jun 17, 2024

Imagine for a moment that you have been gifted 100 expensive wine bottles by 100 different people. But there’s a catch: you discover that one of the bottles has been poisoned, and you have no idea which one. Fortunately, you have a test that can detect poison very accurately, even among a large volume of liquid. Unfortunately, you have only 7 of these tests available. What should you do?

A simple answer would be to just throw away the wine. With 100 bottles and just 7 tests, surely there’s no way you’ll identify the poisoned one. Or is there?

Remember, the test works well even with large volumes of liquid. So what we need is a method to convert 100 pieces of information (i.e. the list of bottles) into a format that can be captured by 7 units of information (i.e. the 7 positive or negative tests) and still uniquely identify each of the original bottles.

Reshaping information

Let’s start with a simpler example to explore how this could work. Say we have 4 bottles and 2 tests. Now suppose we mix up two random batches of wine, by combining wine from various bottles. Because each test can be positive or negative, there are 4 possible combinations of test result that we could in theory see if we test the two batches:

Both negative
1^st test positive and 2^nd negative
1^st negative and 2^nd positive
Both positive

Given our scattergun mixing into two batches can give 4 possibilities for the test results, can we be a bit more systematic, to get each possibility to line up with one of the four bottles?

For example, we could add wine (or not) from each of the bottles into one of two containers as follows:

Container 1: Container 2:

1^st bottle: NO NO

2^nd bottle: NO YES

3^rd bottle: YES NO

4^th bottle: YES YES

In other words, we don’t put any wine from the 1^st bottle in either container, we put wine from the 2^nd bottle in container 1 but not 2, then vice versa for the 3^rd bottle, and finally we put wine from the 4^th bottle in both.

Because we’ve arranged things so that each possible test combination lines up with one specific bottle, the results of the tests will tell us which bottle is poisoned. If no tests are positive, it’s the 1^st bottle; if both are positive it’s the 4^th bottle etc.

You may have noticed (or remembered from seeing another version of this riddle) that the YES/NO decisions above can also be written as binary numbers (i.e. 00, 01, 10, 11). This gives us a natural way to ‘encode’ each wine bottle so it matches a specific set of test results. And because each test can have one of two possible results, with N tests, we can encode 2^N different numbers (or bottles of wine).

So with 7 tests, we can encode up to 2⁷=128 bottles of wine, by converting the number of the bottle into binary. For example, bottle number 25 would be converted to 0011001. Hence once we line up the 7 containers, we’d therefore add wine from the 25^th bottle to container 3, 4 and 7 only.

Efficiency and epidemiology

0011001 is not a particularly neat way of expressing a number in everyday life. Using only two digits, 25 is much pithier. But as a number, 25 is written in base 10 (i.e. each position can take any digit from 0 to 9), whereas our tests can only return 2 possible results. So we need to convert our calculations to base 2 (i.e. binary) to make the best use of the resources we have available.

This approach isn’t to fictional wine bottles. When studying infectious diseases, we often have situations where we want to understand what’s happening, but don’t necessarily have the resources to test large numbers of people. ‘Pooled testing’ can therefore be a useful way to help testing capacity go further. For example, my collaborators at Institut Louis Malardé in French Polynesia used pooled testing for SARS-CoV-2 to identify which arriving traveller groups might be infected in 2020-21, with any positive pools followed up afterwards.

Such testing isn’t limited to humans. With infections like dengue increasing their range, including local transmission in Paris last year, a crucial question is how much virus might be circulating in mosquito populations. But like wine bottles, it’s not feasible to test large numbers of mosquitoes on the off chance of catching a positive one. So instead, researchers use techniques like xenomonitoring, capturing large numbers of mosquitoes then testing the samples in bulk to estimate how common the pathogen is.

Unfortunately, real-life data can be more complicated than doing a simple conversation into binary: tests are often imperfect, and we might have multiple human or mosquito groups infected. But fortunately, there is a growing number of statistical tools to help make sense of the results. By reshaping information, researchers can get better insights into emerging threats like dengue. And with it, a better sense of the risk that will come with a sunset glass of wine in future.

Alexander Harrowell

that's a really neat solution; my own was to do a binary search (split the bottles into two equally sized groups; test one pooled sample at random; if it's positive, that's the group, if negative, it's the other group; repeat) which will also ID the poison bottle out of 100 in 7 rounds providing you know there's one and one only.

Expand full comment

6 replies

Alan Morgan

Feb 12

Brilliant - simple but clever! Thanks for posting…

8 more comments...

Understanding the unseen

Discussion about this post