VACETS Regular Technical Column

The VACETS Technical Column is contributed by various members , especially those of the VACETS Technical Affairs Committe. Articles are posted regulary on [email protected] forum. Please send questions, comments and suggestions to [email protected]

September 23, 1996

Statistics As A Formal System

Even a man of slow intellect who is trained and exercised in arithmetic,
if he gets nothing else from it,
will at least improve and become sharper than before



These days, we are no stranger to statistics. We hear the number of fatalities on the road, the changes of public opinion for or against the government, how many housewives now use the latest Channel 5 products. Politicians, newspaper editors, advertisers and people with an axe to grind all throw statistics at our head. If we are to evaluate such statistics properly, we require an understanding of statistical reasoning and methods.

As a writer says "statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write". In this series, I will attempt to introduce you to this interesting topic. This is a result of one man's effort, it is thus hardly perfect and mistakes are inevitable. I would welcome any comment, suggestion you may have to help me improving the series.

Statistics as a formal system

I feel a little bit anxious to write about statistics to this audience, because I know that there are more than one of you, who had distaste for this discipline in your university days. I certainly did. More than twenty years ago, while I was a first year student in statistics in Vietnam, I felt dizzy with strange probabilistic concepts, even though I was mathematically trained; I was thinking of quitting the subject. I was not intimidated by the arithmetic, but the appalling description of statistical ideas and concepts in many textbooks at the time (and even now in Western countries).

However, one day, I incidentally read a few lines from a book written by Bertrand Russell which, in part: "there was a footpath leading across fields to New Southgate, and I used to go there alone to watch the sunset and contemplate suicide. I did not, however, commit suicide, because I wished to learn more about mathematics". I decided to learn more about mathematics and statistics. Now that after fifteen years of involvement in teaching and research, I look back and have no regret about the career path that I have chosen.

If we think seriously about mathematics, we find that it is the basis for everything we do nowadays. It is not surprised that mathematics is regarded as the prince of science. In most sciences, particularly biological science, the speed of "discoveries" makes one confused. Indeed, these so-called discoveries often turn out to be untrue. This is not surprising given that in biological science, what one has established another undoes. On the contrary, in mathematics, each generation adds a new story to the old structure. Mathematical ideas are timeless and may be described as "eternal truth".

You are now probably wondering whether you have been reading the wrong article, since so far I have talked about mathematics, not really statistics. However, the rationale for my preamble is that statistics is usually defined as a branch of applied mathematics, which is in turn a modern discipline of modern mathematics. But, in practice, modern mathematics is one of the principal tools of statistics. So, a few words of mathematical introduction is required. In this series, I will introduce to you some of the modern statistical concepts and ideas that have somewhat become hallmarks of modern science. I will begin with a discussion of an elementary question: what is statistics?

Many textbooks define statistics as an applied mathematical discipline, concerning with the collection, analysis and interpretation of data. However, I think statistics is a composite domain, containing at least two distinctly different intellectual activities: (1) the acquisition, logical organisation and numerical presentation of data, and (2) the analysis of the data to arrive at decisions about degrees of variation, interrelation, and difference. The first type of activity may be called "descriptive statistics"; it produces the collection of data that appears in financial charts, birth rates, death rates, population census, etc. The second type of activity may be referred to as "inferential statistics"; it is responsible for such calculations such as t-test, confidence intervals, chi square test, linear regression, analysis of variance, etc. The first activity requires no particular scholastic training in statistics and can be performed by any intelligent person, while the second activity requires a formal statistical and mathematical training.

Statistical activities resemble closely the task of science, which is to gather natural knowledge, to arrange that knowledge coherently and to comprehend patterns or theories discerned therein. Statistics is thus a science. It is also a formal system. Examples of a formal system may include logic, grammar and mathematics. These are concerned with the form, not the content, of statements. We may write "the Vietnamese always have blond hair". Grammatically, this is correct, but substantially, it is nonsense.

The study of grammar does not protect one from writing nonsense. Similarly, logic is the study of formal properties of propositions, and the rules tell us what conclusions deduced from them, are valid, but logic does not ensure that conclusions are true. Consider, for example,

Premise 1: All men are creature of habit;

Premise 2: All creatures of habit are fool;

Conslusion: All men are fools.

The conclusion is validly deduced from the two premises, but it is not necessarily true. It is only true if both premises are true. It should be noted that because premise is false, it does not follow that conclusion is also false. Men can be fools for other reasons, love, for example! The science of inferential statistics may be considered in the same way. For example, the average of the set of values 1, 1, 2, 3, 2, 1, 11 is 3, which is statistically correct, but is meaningless as a representative value, since it conceals entirely the abnormal value of 11. In statistics, a number of axioms and postulates are stated and conclusions deduced from them by the mathematical game. This system, like logic, may be used as a model to ensure that premises or assumption it makes are warranted by the nature of the phenomena it purports to describe. I will return to this important point in the next few articles.

But statistics does not just deal with scientific theories, it is also concerned with more practical issues in real life. In the old days, medicine was taught as a black-and-white discipline - there was no room for error, the doctor was always right. However, with the advance of knowledge and information, modern medicine finally realises that doctors can indeed make errors in diagnosis and clinical judgement. Sir G. Pickering, a prominent British medical researcher, implicitly acknowlege this by noting that "doctors want to help patients, but the extent to which they can help obviously depends on the doctor's knowledge. But knowledge is a matter of probability. Diagnosis is a matter of probability, and in judging treatment, doctors have to base their judgment on knowledge of probability". A new drug is unlikely to treat successfully 100% of of patients. The reality of the world is harsh and unyielding, and must be dealt with on its own terms.

There is no way to eliminate completely the risks of being wrong. Our real problem is not how to eliminate them, but how to live with them intelligently. In Vietnamese, we have a wonderful saying "Mu+u su+. ta.i nha^n, tha`nh su+. ta.i thie^n". In real life, things do not always work out the way we hypothesized or we planned. The main reasons for this are likely that (i) our hypothesis is incorrect and/or (ii) we do not have enough evidence to reject/accept the hypothesis. The former is hypothetical idea which can be re-defined, however, the latter is fact and can not be changed but can be dealt with in probabilistic terms. It is not surprised that statistics has now become an important, if not to say essential, tool in quality control, medical research and any experimental research.

I have heard from prominent academics in the US, who said privately that the distinction in science between the East and the West is that the latter knows how to do statistics in research, while the former does not. Although the comment is rather arrogant and tasteless, there is an element of truth in it. I have personally reviewed many scientific papers and research grants of researchers from Eastern European and Japan, and found that while their experimental works are fine, their treatment of data is absolutely ridiculous. There is no choice but such pieces of research have to be rejected for publication. That perhaps explains why most researchs were mainly published by the US and Western scientists.

Vietnamese students are traditionally competent mathematicians, yet to my knowledge, very few specialise in statistics. This has root in the education of statistics there. The teaching of statistics in Vietnam was and is still dominated by the first type of activity (descriptive statistics), which is not in line with the rest of the world, notably in developed countries, who are more concerned with inferential statistics. In fact, most universities in Vietnam do not have the department of statistics and most statisticians there are pure mathematicians, not applied statistician. On the other hand, statistics, while taught at universities, has not yet found its way to application in industry and research. There are however encouraging signs in Hanoi and Saigon, where academic books in statistics have been translated and used in the teaching of statistics.

Some years ago, I read a book which has the following lines of advice, which I would like to quote here: "If you are young, I would suggest you to learn statistics as soon as you can. Do not dismiss it through ignorance or because it calls for thought. Do not pass into eternity without having examined these techniques and thought about the possibility of application in your field of work, because very likely you will find it an excellent substitute for your lack of experience in some directions. If you are older and already crowned with the laurels of success, see to it that those under your wing who look to you for advice are encouraged to look into this subject. In this way, you will show that your arteries are not yet hardened and you will be able to reap the benefits without doing overmuch work yourself. Whoever you are, if your work calls for the interpretation of data, you may be able to do without statistics, but you will not be able to do so well." I strongly believe that the advice is still appropriate to our brothers and sisters.

Tuan V. Nguyen, Ph.D.
Bone and Mineral Research Division
Garvan Institute of Medical Research
384 Victoria St Sydney 2010 Australia
Phone: +612 295 8246
Fax: +612 295 8241
t[email protected]

For discussion on this column, join [email protected]

Copyright © 1996 by VACETS and T V Nguyen

More Articles

How high can you suck

Asynchrounous Transfer Mode (ATM) - an analogy

The UNIX Runtime environment

National Information Infastructures in Pacific Asia: A Vision ...

Largest known Prime discovered

Internet over Cable

Other Links

VACETS Home Page

VACETS Electronic Newsletter