Imagine you were to gather 365 randomly chosen people in a room. What’s the probability at least one of them has a birthday today?
1%? 20%? 90%? Have a guess.
Next, here’s another, less celebratory, puzzle. Suppose 1 in every N people currently have flu in a community. If N random people attend a gathering, what’s the chance at least one of them has flu?
You might have noticed that the second question is basically a more general version of the first (for birthdays, we just have N=365). And this leads to a handy rule of thumb.
From nobody to somebody
If 1 in every N people have a disease (or another characteristic, like a birthday today), we can calculate the probability at least one person in a gathering of N people has this characteristic by first working out the probability nobody has it.
We’ve assumed the probability a specific person has the charateristic is 1/N, so the probabilty they don’t have the characteristic is 1–1/N.
So the probability that none of the N people at the gathering have it is this value multiplied together N times. Or equivalently (and more concisely) this value to the power of N:
So the probability at least one person in the group has the characteristic can be calculated as 1 minus the above probability:
Now you might be thinking ‘that’s quite a headache to calculate in my head, even if it can give me the birthday answer’ (i.e. around 63%).
But fortunately there’s a shortcut we can take.
Whenever a mathematical calculation produces a nice concise formula, it’s worth checking whether that formula appears anywhere else. In the case of the above, the formula also describes the exponential function as N gets very large:
So if x=-1, the probability at least one person in the group has the characteristic can be approximated by:
So regardless of the value of N, if 1 in every N people have a characteristic in a population, and N random people attend a gathering, there’s roughly a 63% chance at least one person at the gathering will have it. (This rule-of-thumb will be more approximate for smaller groups, e.g. if N=10, then the true value would be closer to 65%.)
But this isn’t just about birthday and disease puzzles. As well as making it easier to get intuition about problems on the fly, rules of thumb like the above can reveal deeper mathematical links we might not have spotted otherwise. And it turns out that this isn’t the only puzzle where exp(-1) crops up as a solution.
The googol problem
In 1960, mathematician Martin Gardner described the following puzzle in the February issue of Scientific American:
Ask someone to take as many slips of paper as he pleases, and on each slip write a different positive number. The numbers may range from small fractions of 1 to a number the size of a googol (1 followed by a hundred zeroes) or even larger. These slips are turned face down and shuffled over the top of a table. One at a time you turn the slips face up. The aim is to stop turning when you come to the number that you guess to be the largest of the series. You cannot go back and pick a previously turned slip. If you turn over all the slips, then of course you must pick the last one turned.
This puzzle would become more famously retold as the ‘secretary problem’ or the ‘fiancé(e) problem’, with cards substituted for candidates and numbers representing assessments of suitability. Suppose we interview a series of candidates one by one, with the condition that we have to make a decision whether to accept or reject after each interview, and if we reject a candidate, we can’t reconsider them later. How should we maximise our chances of selecting the best candidate?
The solution is harder to reach than in the examples above, but once we get there, the end result is remarkably simple. For the basic secretary problem, we can maximise our chances of selecting the best candidate by rejecting an initial proportion exp(-1) of applicants (i.e. the first 37% we meet), then pick the next candidate we see who is better than all previous ones.
This strategy works by balancing the risk of stopping too early or waiting too long. Even so, the overall probability it produces the best candidate also happens to be exp(-1), or about 37%. And this is true regardless of whether there are a dozens or thousands of candidates.
What’s more, if you’ve got 1000 candidates lined up and follow this strategy, then you’ll automatically reject the first 1000 x exp(-1) of them, which equates to 368 peole. Which, as we’ve seen already, means you’ll also have around a 63% chance of meeting at least one who has a birthday today in this group.
So remember to be nice to this hypothetical mathematical group.
If you’re interested in more handy rules of thumb, have a look at this previous post of mine:
Cover image: Sathish J via Flickr (CC BY-NC-ND 2.0)
Dear Adam, can we truly compare the way of calculating the probability of someone having the same birthday in the same room with someone one having a disease? A birthday is something objective but being in the room despite being ill is in itself another equation to add as it will depend of other factors... Sorry if I am missing something obvious and THANK YOU for your so comprehensible articles, a pleasure to read them.
This is cool but it seems the "Secretary Problem" solution here is very limited vs real world situations. Seems it relies on a few assumptions:
1) you are not looking to get as good of a candidate (as high of a number) as reasonably possible, you are trying to maximize your odds of getting the single highest number. In real life there's a benefit to ending up with the 2nd highest number over the lowest number, but here we make a sharp line between the absolute best and everything else
2) the distribution is strictly random. If we find the underlying distribution is say normal, then once we have enough experience to calculate a mean and std deviation, we should be able to recognize true outlier results and calculate the odds of finding something higher in the remaining observations