VACETS Regular Technical Column
The VACETS Technical Column is contributed by various members , especially those of the VACETS Technical Affairs Committe. Articles are posted regulary on [email protected] forum. Please send questions, comments and suggestions to [email protected]

VACETS Regular Technical Column

The VACETS Technical Column is contributed by various members , especially those of the VACETS Technical Affairs Committe. Articles are posted regulary on [email protected] forum. Please send questions, comments and suggestions to [email protected]

October 8, 1996

The Abuse Of Statistics

In the last article, I view statistics as a system of formal logic, just like grammatical syntax in languistics. That is, one talk nonsense things but grammatically correct. In statistics, too, with a computer and a statistical program, one can calculate every conceivable statistic to support his/her view in a particular subject. The calculation may be correct, but the assumption and interpretation may well be incorrect. Indeed, the use of statistics has become the abuse of statistics, and give it a bad name.

Many years ago, while I was still in Vietnam, one of the common quotes about statistics at the time was from Lenin, that "statistics is the ear and the eyes of the party". Later, having resettled in Australia, I usually heard the quotes from a British parliamentarian, B. Disraeli, that "there are three kinds of lies: lies, damned lies, and statistics". What a remarkable contrast between the two worlds of doctrines! In communist countries, statistics is viewed as a vital tool for control of tyrants, while in the capitalist world, it is often viewed as a way of manipulating information.

However, there are elements of truth in the latter quote. Indeed, the amount of statistical information that is disseminated to the public, for one reason or another is sometimes beyond comprehension, and what part of it is "good" statistics and what part of it is "bad" statistics is anybody's guess. Certainly, all of them can not be accepted uncritically. In newspapers, the use of statistics has already been replaced by overuse and abuse. Journalists and advertisers are sometimes naive in their claims. Bounteous eulogies are bestowed upon their products in a mounting cressendo of praise only to be followed by the assurance that their products have no existence in fact. Consider the statement "Three times better than on PANADINE". Better than what? Does one get three times better results from PANADINE with the best competitive brand, or with the average among other brands, or, indeed with the poorest quality alternative? Even if we could sort out these questions, how are we to measure betterment? What is the unit of measurement? Or perhaps was the patient is supposed to be three times better now than he/she was before he/she took the drug. The patient was not well to begin with and he/she therefore can not be three times better. It is impossible to compare negative and positive quantities in a positive ratio.

The advertisement may also quote irrelevant statistical data in support of the headline. For instance, the headline might read "651 more accidents last year". The accuracy of the data may be undoubted and its existence adds its own air of respectibility, but it has no direct reference to the advertised durability of tyres. It proves only that there have been more accidents; the condition of the tyres of the vehicles involved may have play no part in the increase. There may be more accidents because simply there were more cars on the road. To argue, from the evidence available, that the increase in the number of accidents was in some way related to the use of worn tyres is as absurd as claiming that, since there are more new cars in use, there must also therefore be more new tyres in use, and that, as accidents have increased in proportion, then the increase in the number of accidents is the result of an increased number of new tyres!

Consider the following familar statements from newspaper:

1. Two out of three dentists, responding to a survey, said that they would recommend PLANET gum to their patients who chew gum.

2. Heart disease is responsible for 40% of death in the World, therefore, we should spend more money on research into heart disease.

3. A survey of a random sample of 2000 drivers has recently been completed. Of these drivers under 30, 55% had had a car accident in the past year, whereas only 40% of the older drivers had been involved in an accident in that time. Clearly, therefore, young drivers are worse drivers.

4. You should cross at a pedestrian crossing - it is five times safer

5. Four out of five housewives use LAX

6. The phone-in survey demonstrated solid support for the government's position: 62% were in favour, 38% against".

Although these statements may be appealed to average readers. A more serious reader would immediately question their validity. For example, it would be reasonable to ask in (1) why do one of 3 not recommend PLANET; in (2) at what age do heart failure occur, because this is perhaps more relevant to the expenditure of money. In (3), one possible explanation is that young drivers are likely spend more time on the road, or drive worse cars, and hence it is perhaps not fair to conclude that they are the worse drivers. In (4) the obvious question is "fewer than what?". The representative of sample can be asked in (5), i.e., which 5 housewives? Statement (6) is often heard in TV although it is possible that ardent supporters could make many calls and hence inflate the results. I have heard the following quote, which I believe, is still appropriate: "statistics is like a bikini bathing suit: what it reveals is interesting, but what it conceals is vital" .

The abuse of statistics does not just occur in popular media, it is also prevalent in scientific media. In medicine, we often hear wonderful discovery of new treatments announced by respectable professors and medical experts. But many of these "discoveries" turn out to be phony. The pressure of publication and the hunger for publicity have given rise to the abuse of statistics in medical research. In almost every medical research paper published today, one can easily find mistakes. This is shocking, but strangely, the malpractice is on-going in the scientific press. Indeed, many of these professors or medical experts do not really know how to analyse the data. In fact, many of them just simply manipulate the data in the way they want so that to create a sensation. It is thus not surprising that sometimes entirely erroneous conclusions are based on unsound data or dodgy methods.

Next time, when you read an article from a newspaper or even from a scholarly journal, ask yourself whether the author(s) have interpreted the data properly? do the authors have expertise in the analysis and dissemination of data? if none of those questions are answered properly, you have every reason to doubt the validity of the article.

The computer can be blamed for this tragedy of knowledge. Why? It is no doubt that the advent of electronic computers has revolutionised the statistical practice. Computer has replaced pencil in solving complex equations in data analysis. Many people believe that by arming themselves with a computer and some statistical programs, they can do any statistical calculations. Statistics, to them, can be reduced to the computer keyboard.

But, statistics is not just a collection of theorems or formulae; it is a style of thinking. Computing is also a style of thinking. I, therefore, do not believe that statistics can be reduced to button pushing on a computer keyboard and still retains its style of thinking. I do not believe that computer can, one day, replace our thinking. This view may challenge some of you, who are expert in artificial intelligence, in this audience, but I welcome the discussion.

See you next time.

T V Nguyen, Ph.D.
[email protected]

For discussion on this column, join [email protected]

Copyright © 1996 by VACETS and Tuan V. Nguyen

:

Other Articles

How high can you suck

Asynchrounous Transfer Mode (ATM) - an analogy

The UNIX Runtime environment

National Information Infastructures in Pacific Asia: A Vision ...

Largest known Prime discovered

Internet over Cable

Statistics as A Formal System

Cubic Planet Earth

A New Era In Telecommunications In The U.S.

Other Links

VACETS Home Page

VACETS Electronic Newsletter

VACETS FTP Site