In the last article, I view statistics
as a system of formal logic, just like grammatical syntax in languistics.
That is, one talk nonsense things but grammatically correct. In statistics,
too, with a computer and a statistical program, one can calculate every
conceivable statistic to support his/her view in a particular subject.
The calculation may be correct, but the assumption and interpretation may
well be incorrect. Indeed, the use of statistics has become the abuse of
statistics, and give it a bad name.
Many years ago, while I was still
in Vietnam, one of the common quotes about statistics at the time was from
Lenin, that "statistics is the ear and the eyes of the party".
Later, having resettled in Australia, I usually heard the quotes from a
British parliamentarian, B. Disraeli, that "there are three kinds
of lies: lies, damned lies, and statistics". What a remarkable contrast
between the two worlds of doctrines! In communist countries, statistics
is viewed as a vital tool for control of tyrants, while in the capitalist
world, it is often viewed as a way of manipulating information.
However, there are elements of truth
in the latter quote. Indeed, the amount of statistical information that
is disseminated to the public, for one reason or another is sometimes beyond
comprehension, and what part of it is "good" statistics and what
part of it is "bad" statistics is anybody's guess. Certainly,
all of them can not be accepted uncritically. In newspapers, the use of
statistics has already been replaced by overuse and abuse. Journalists
and advertisers are sometimes naive in their claims. Bounteous eulogies
are bestowed upon their products in a mounting cressendo of praise only
to be followed by the assurance that their products have no existence in
fact. Consider the statement "Three times better than on PANADINE".
Better than what? Does one get three times better results from PANADINE
with the best competitive brand, or with the average among other brands,
or, indeed with the poorest quality alternative? Even if we could sort
out these questions, how are we to measure betterment? What is the unit
of measurement? Or perhaps was the patient is supposed to be three times
better now than he/she was before he/she took the drug. The patient was
not well to begin with and he/she therefore can not be three times better.
It is impossible to compare negative and positive quantities in a positive
The advertisement may also quote
irrelevant statistical data in support of the headline. For instance, the
headline might read "651 more accidents last year". The accuracy
of the data may be undoubted and its existence adds its own air of respectibility,
but it has no direct reference to the advertised durability of tyres. It
proves only that there have been more accidents; the condition of the tyres
of the vehicles involved may have play no part in the increase. There may
be more accidents because simply there were more cars on the road. To argue,
from the evidence available, that the increase in the number of accidents
was in some way related to the use of worn tyres is as absurd as claiming
that, since there are more new cars in use, there must also therefore be
more new tyres in use, and that, as accidents have increased in proportion,
then the increase in the number of accidents is the result of an increased
number of new tyres!
Consider the following familar statements
1. Two out of three dentists, responding
to a survey, said that they would recommend PLANET gum to their patients
who chew gum.
2. Heart disease is responsible
for 40% of death in the World, therefore, we should spend more money on
research into heart disease.
3. A survey of a random sample of
2000 drivers has recently been completed. Of these drivers under 30, 55%
had had a car accident in the past year, whereas only 40% of the older
drivers had been involved in an accident in that time. Clearly, therefore,
young drivers are worse drivers.
4. You should cross at a pedestrian
crossing - it is five times safer
5. Four out of five housewives use
6. The phone-in survey demonstrated
solid support for the government's position: 62% were in favour, 38% against".
Although these statements may be
appealed to average readers. A more serious reader would immediately question
their validity. For example, it would be reasonable to ask in (1) why do
one of 3 not recommend PLANET; in (2) at what age do heart failure occur,
because this is perhaps more relevant to the expenditure of money. In (3),
one possible explanation is that young drivers are likely spend more time
on the road, or drive worse cars, and hence it is perhaps not fair to conclude
that they are the worse drivers. In (4) the obvious question is "fewer
than what?". The representative of sample can be asked in (5), i.e.,
which 5 housewives? Statement (6) is often heard in TV although it is possible
that ardent supporters could make many calls and hence inflate the results.
I have heard the following quote, which I believe, is still appropriate:
"statistics is like a bikini bathing suit: what it reveals is interesting,
but what it conceals is vital" .
The abuse of statistics does not
just occur in popular media, it is also prevalent in scientific media.
In medicine, we often hear wonderful discovery of new treatments announced
by respectable professors and medical experts. But many of these "discoveries"
turn out to be phony. The pressure of publication and the hunger for publicity
have given rise to the abuse of statistics in medical research. In almost
every medical research paper published today, one can easily find mistakes.
This is shocking, but strangely, the malpractice is on-going in the scientific
press. Indeed, many of these professors or medical experts do not really
know how to analyse the data. In fact, many of them just simply manipulate
the data in the way they want so that to create a sensation. It is thus
not surprising that sometimes entirely erroneous conclusions are based
on unsound data or dodgy methods.
Next time, when you read an article
from a newspaper or even from a scholarly journal, ask yourself whether
the author(s) have interpreted the data properly? do the authors have expertise
in the analysis and dissemination of data? if none of those questions are
answered properly, you have every reason to doubt the validity of the article.
The computer can be blamed for this
tragedy of knowledge. Why? It is no doubt that the advent of electronic
computers has revolutionised the statistical practice. Computer has replaced
pencil in solving complex equations in data analysis. Many people believe
that by arming themselves with a computer and some statistical programs,
they can do any statistical calculations. Statistics, to them, can be reduced
to the computer keyboard.
But, statistics is not just a collection
of theorems or formulae; it is a style of thinking. Computing is also a
style of thinking. I, therefore, do not believe that statistics can be
reduced to button pushing on a computer keyboard and still retains its
style of thinking. I do not believe that computer can, one day, replace
our thinking. This view may challenge some of you, who are expert in artificial
intelligence, in this audience, but I welcome the discussion.
See you next time.
T V Nguyen,
For discussion on
this column, join firstname.lastname@example.org
1996 by VACETS and Tuan V. Nguyen