The Use and Abuse of Statistics

Sound policies rely on sound data, but statistics can be very misleading. Here are some of the most obvious problems.

The person or organisation providing the data may, deliberately or otherwise, have selected data which supports their interests or their point of view.

Aggregated statistics can look very different to the underlying figures. Vehicle accident statistics, for instance, generally include young and accident-prone drivers, as well as injuries to pedestrians and cyclists. Indeed, I understand that a middle-aged car driver in good weather may well be just as safe, over most long journeys in the UK, as if he or she were flying, which is a very safe form of transport.

Small sample sizes can produce very misleading results. Try to identify or organise meta research which aggregates the results of numerous smaller samples.

Those who do not respond to surveys may have very different views to those who do. Imagine a 30 to 10 split "in favour" in responses to a questionnaire. Does this mean that "75% believe that ..." ? Not if the response rate was 40% and almost all the remaining 60% thought not. Matthew Syed reported one example as follows:

Those responding to questionnaires may not tell the truth, inadvertently or deliberately

Different organisations will record data in different ways. The classic example is in France where, if an elderly person is found dead without evidence of health problems, it is acceptable to attribute the death to ‘old age’, thus reducing the apparent incidence of heart attacks. But crime statistics can be similarly unreliable, as are many others.

Education and other regulators know that it is quite wrong to try to compare 'apples and pears". A 'First' from one university - or a First' in Engineering - may well be harder to achieve that a 'First' from another body, or in a different subject. And that is before you start allowing for grade drift as universities compete to attract students.

Equally, students and patients may not be the best judge of the quality of their teachers or doctors respectively. Professors who perform in an entertaining way, and doctors with great bedside manners, may be far from the best in their profession. Remember Harold Shipman ...!

The fact that there have been no incidents does not mean that something is safe. It is possible that the reason why fewer children are now killed on our roads is not because they are inherently safer than decades ago, but rather because they are so dangerous that many children are not allowed near them.

Death and injury rates can look very different when presented as a number (e.g. number of children killed in an incident) rather than as a proportion of the exposed population per annum.

A report of deaths caused by, for example, air pollution might include a high proportion of those whose death was already imminent, rather than deaths from amongst an otherwise healthy population.

Survival rates can be very misleading. Screening for cancer, for instance, often appears to generate a high survival rate (over 5 years, say) compared with the survival rate of those whose cancers are detected when symptoms become obvious. But this can be because the time of diagnosis is earlier, so it appears that patients live longer even if treatment is ineffective. Or it can be because the tests also identify slow growing cancers.

So what is to be done? If possible, you should design your own questionnaires and data-gathering exercises with the help of professional statisticians. To the extent that this is not possible, you must treat all data with a heavy dose of cynicism, bearing in mind all the issues listed above.

But do not be tempted, when faced with a hostile press or a one-sided lobby, to assemble your own dodgy statistics – or dodgy science – to fight them off. The inevitable result would be that those with whom you are trying to communicate would then see you as prejudiced and/or adversarial, and you might also then fail to pay insufficient attention to perfectly reasonable arguments from ‘the other side’.

Nutrition Research

Several of the above problems bedevil scientific 'advice' about diet.  New Scientist magazine's Clare Wilson published an interesting article on this area in July 2019.  Here is an extract:

The big problem with these “observational” studies is that eating certain foods tends to go hand in hand with other behaviours that affect health. People who eat what is generally seen as an unhealthy diet – with more fast food, for instance – tend to have lower incomes and unhealthy lifestyles in other ways, such as smoking and taking less exercise. Conversely, eating supposed health foods correlates with higher incomes, with all the benefits they bring. These other behaviours are known as confounders, because in observational studies they can lead us astray. For example, even if blueberries don’t affect heart attack rates, those who eat more of them will have fewer heart attacks, simply because eating blueberries is a badge of middle-class prosperity.  Researchers use statistical techniques to try to remove the distorting effects of confounders. But no one knows for certain which confounders to include, and picking different ones can change results.

To show just how conclusions can vary based on choice of confounders, Chirag Patel at Harvard Medical School examined the effects of taking a vitamin E supplement. He used a massive data set from a respected US study called the National Health and Nutrition Examination Survey. Depending on which mix of 13 possible confounders are used, taking this vitamin can apparently either reduce death rates, have no effect at all or even raise deaths. Patel says this shows researchers can get any result they want out of their data, by plugging into their analysis tools whatever confounders give an outcome that fits their favoured diet, be it low-fat or low-carbohydrate, vegetarian or Mediterranean.. ...

Another source of error is known as publication bias:   studies that show interesting results are more likely to get published than those that don’t. So if two studies look at red meat and cancer, for instance, and only one shows a link, that one is more likely to be published.  This bias happens at nearly every stage of the long process from the initial research to publication in a scientific journal and ultimately to news stories, if journalists like me write about it. “What you see published in the nightly news is the end result of a system where everyone is incentivised to come up with a positive result,”.

 

Martin Stanley

Spotted something wrong?
Please do drop me an email if you spot anything that is out-of-date, or any other errors, typos or faulty links.