Test results provide essential information in terms of dealing with an infection. Understanding the accuracy of a test will help you to plan the best course of action. The results may be surprising.
This article shows how what seems to be a highly accurate test may not give you the certainty you think it does. An explanation follows with a bit of reasoning and help from an 18th-century Presbyterian minister.
Staying safe should be everyone’s primary concern. Nothing here detracts from the need to take all possible precautions against coronavirus and other diseases.
99% Accurate Does Not Mean 99% Conclusive
I was discussing COVID-19 tests, specifically lateral flow tests, recently with a friend. He read somewhere that these tests are 99% accurate. He assumed then, that if he tested positive, the probability of infection was also 99%. This is almost certainly not the case. In fact, without more information, it’s impossible to know how likely you are to be infected!
Something called Bayes’ Theorem will help us explain what’s going on. Thomas Bayes was a Presbyterian minister, born in Hertfordshire in 1701. His theorem is used frequently in data analysis — despite being more than 300 years old. The theorem relates to the conditional probability of events. Or, to put it a little differently, what is the likelihood of an event happening if we have prior knowledge relating to that event.
A Little Mathematical Theory
Bayes’ theorem allows us to update our predictions by introducing new information. The formula can be a little daunting for people who are not mathematically minded.
If that’s you, please don’t worry. We’ll only be using simple arithmetic from this point onwards. In terms of what it means, P(A|B) is just shorthand for saying the probability of A given B. For example:
- P(Rain|Clouds) could mean the probability of it raining soon, given that dark storm clouds are gathering overhead.
- P(Infected|Positive test) could mean the probability of being infected given that your test result is positive.
In essence, Bayes’ theorem allows us to improve an existing prediction when new information comes to light. A great benefit of Bayes’ theorem is we can keep applying it as more information emerges.
The Probability of Being Infected Given a Positive Test Result
Reports often state that 1 in 3 people with COVID-19 do not have any symptoms but can still infect others. This is a difficult number to predict, and some reports publish a much higher percentage. A lateral flow test is a quick way to test for COVID-19 for people with no physical reason to believe they are infected.
I started by looking at how many cases there are in my area. I figured that I had a higher chance of getting infected by the people in my community than elsewhere. Anyone based in the UK can do the same by going to this BBC web page. The site reports 341 cases per 100,000 people in my neck of the woods. So, 0.34% of people in my location are currently infected with COVID-19. Or optimistically, 99,659 people are infection-free.
We now have all the information to calculate the probability of being infected having received a positive test result.
A Confusion Matrix Can Be… Confusing
To keep things simple, we will assume that negative and positive test results are 99% accurate. Several key measurements quantify the performance of tests of this nature. A confusion matrix (see below) summarises some of the critical measures.
For a test that is 99% accurate, we should predict 99% of the people who are positive, which is 338. Equally, the test should predict 99% of the 99,659 negative cases, which is 98,662.
With a little more effort, we can determine the two following statistics:
- The probability of being infected after receiving a positive test result is approximately 25%. We calculate this by dividing the 338 correctly predicted positive results by 1,334, the total number of expected positive results.
- The probability of not being infected after receiving a negative test result is approximately 99.99%. We calculate this by dividing the 98,662 correctly predicted negative results by 98,666, the total number of predicted negative results.
So only 25% of people receiving a positive test result are infected in this scenario!
What’s the Impact of Getting It Wrong?
A false positive result (a positive test result for someone without the disease) can lead to unnecessary isolation. It can also create work and costs for contact tracing and additional tests.
A false negative result (a negative test result for someone who is infected) is potentially a much larger problem. People who believe they are not infected are more likely to pass the virus on to others, helping to propagate the disease.
Go here to try out your scenario in a Google spreadsheet. There are far fewer false negatives than false positives in our scenario, so that is an advantage.
The 25% probability of being infected, given a positive test result, seems surprisingly small. The reason is that most of the positive test results consist of false positives.
There are two critical measures called specificity and sensitivity that together indicate a test’s accuracy.
- A test’s sensitivity reports how many infected people it correctly identifies.
- A test’s specificity reports how many non-infected people it correctly identifies.
For this example, I simply used 99% for each of them as an overall indication of effectiveness. The spreadsheet allows you to see the impact of different values for sensitivity and specificity.
I mentioned chaining Bayes’ calculations together as more information comes to light. If someone takes a second test, we can use the probability of infection from the first test as input. The spreadsheet also contains this calculation — more tests will increase the likelihood of being correct, although this will never reach 100%. The analysis also shows how the accuracy increases with a second test. Two positive tests mean you are more than 97% likely to be infected for the figures we are using.
This example is simple and does not consider many factors that may affect the transmission and contagion of disease. However, it does show the care we should take when interpreting statements such as ‘this test is 99% accurate’.
Bayes’ theorem is a powerful tool that belongs in the kitbag of every data scientist. It also helps us all to understand what a positive or negative test result means. Stay safe!