### Names, damned names, and statistics

When I tell people that I'm a statistician, the usual response is a blank stare. Explaining that I work with statistics only makes matters worse. Those who have been exposed to statistics at university often blurt out "I had to take a statistics course -- and I hated it!" What they remember of the course is mostly that it was boring and there were a lot of formulas. Those who have had no formal exposure to statistics seem to think it might have to do with collecting and tabulating figures, like sports statistics or national economic figures. This isn't completely off the mark, but by itself it's a poor description of what statisticians do.

And about half the time, mention of "statistics" elicits the helpful response: "There are lies, damned lies, and statistics!" (Often attributed to Mark Twain, but apparently originally from Benjamin Disraeli.) Something of a variant on this is the claim that "You can prove anything with statistics!"

Clearly there are several issues at play here, including minimal public knowledge of what the field of statistics is about, poorly taught statistics courses, and prejudices about empirical reasoning. Statisticians must accept a good part of the blame for each of these. (John Nelder writes [1] that "Almost nobody knows what statisticians do, and we in turn have been remarkably ineffective in explaining to non-statisticians what we are good at.") But part of the problem is the word "statistics" and its difficult-to-pronounce-and-spell sibling "statistician".

A statistic is a function of a set of observations, for example the total, the average value, the maximum value, or what have you. Governments have always wanted to keep track of information about the state (like births and deaths, imports and exports, agricultural production, etc.), which is where the word statistic comes from.

"Statistics" means more than one statistic, but confusingly it also refers to the study of how to draw conclusions from observations. A more formal term for this is inductive inference, to be contrasted with deductive inference. Deductive inference (or simply deduction) is classical logic: when the premises are true and the argument is valid, the conclusion must be true. If all swans are white, and Tom is a swan, then Tom is white. Inductive inference (or induction) is not so simple. Suppose we observe 100 hundred swans and they are all white. We might conclude that all swans are white. But this conclusion might be incorrect. (Apparently there are black swans, by the way.) Uncertainty is inevitable: for example in political polling, the stock phrase is that the results are accurate to within plus or minus 3%, 19 times out of 20. Uncertainty is inevitable because of the variability that we find everywhere: political opinions vary, height and weight differ, some people are more susceptible to certain diseases than others (perhaps due to differences in genetics, among other things). When we try to measure something accurately several times, we get slightly different answers. This is sometimes called measurement error, or noise, but in a sense it's just another source of variability. Probability theory lets us describe variability. For example, if we toss a fair coin 4 times, the probability of getting 4 heads is one sixteenth. But statistical inference uses probability theory to deal with the inverse problem: if we toss a coin 4 times and it comes up heads each time, can we conclude that it's not a fair coin?

Given that statistics has such broad relevance, it's a shame that it has been saddled with such a poor name. If "a rose by any other name would smell as sweet", I'm hoping that statistics by another name will smell sweeter!

Bill Cleveland suggests the name data science. John Nelder suggests "statistical science" [1]. And a friend of mine suggests, tongue-in-cheek, that statisticians could be called "noise-busters".

Of the above suggestions, my preference would be "statistical science", so that a statistician would be a "statistical scientist". But maybe there's a better name out there somewhere ...

[1] Nelder J.A. From statistics to statistical science. The Statistician. Vol. 48, No. 2 (1999), 257-269.

And about half the time, mention of "statistics" elicits the helpful response: "There are lies, damned lies, and statistics!" (Often attributed to Mark Twain, but apparently originally from Benjamin Disraeli.) Something of a variant on this is the claim that "You can prove anything with statistics!"

Clearly there are several issues at play here, including minimal public knowledge of what the field of statistics is about, poorly taught statistics courses, and prejudices about empirical reasoning. Statisticians must accept a good part of the blame for each of these. (John Nelder writes [1] that "Almost nobody knows what statisticians do, and we in turn have been remarkably ineffective in explaining to non-statisticians what we are good at.") But part of the problem is the word "statistics" and its difficult-to-pronounce-and-spell sibling "statistician".

A statistic is a function of a set of observations, for example the total, the average value, the maximum value, or what have you. Governments have always wanted to keep track of information about the state (like births and deaths, imports and exports, agricultural production, etc.), which is where the word statistic comes from.

"Statistics" means more than one statistic, but confusingly it also refers to the study of how to draw conclusions from observations. A more formal term for this is inductive inference, to be contrasted with deductive inference. Deductive inference (or simply deduction) is classical logic: when the premises are true and the argument is valid, the conclusion must be true. If all swans are white, and Tom is a swan, then Tom is white. Inductive inference (or induction) is not so simple. Suppose we observe 100 hundred swans and they are all white. We might conclude that all swans are white. But this conclusion might be incorrect. (Apparently there are black swans, by the way.) Uncertainty is inevitable: for example in political polling, the stock phrase is that the results are accurate to within plus or minus 3%, 19 times out of 20. Uncertainty is inevitable because of the variability that we find everywhere: political opinions vary, height and weight differ, some people are more susceptible to certain diseases than others (perhaps due to differences in genetics, among other things). When we try to measure something accurately several times, we get slightly different answers. This is sometimes called measurement error, or noise, but in a sense it's just another source of variability. Probability theory lets us describe variability. For example, if we toss a fair coin 4 times, the probability of getting 4 heads is one sixteenth. But statistical inference uses probability theory to deal with the inverse problem: if we toss a coin 4 times and it comes up heads each time, can we conclude that it's not a fair coin?

Given that statistics has such broad relevance, it's a shame that it has been saddled with such a poor name. If "a rose by any other name would smell as sweet", I'm hoping that statistics by another name will smell sweeter!

Bill Cleveland suggests the name data science. John Nelder suggests "statistical science" [1]. And a friend of mine suggests, tongue-in-cheek, that statisticians could be called "noise-busters".

Of the above suggestions, my preference would be "statistical science", so that a statistician would be a "statistical scientist". But maybe there's a better name out there somewhere ...

[1] Nelder J.A. From statistics to statistical science. The Statistician. Vol. 48, No. 2 (1999), 257-269.

## 5 Comments:

As a description of what statistics is (rather than a new name), Keith O'Rourke suggests "risk management of learning from observations".

Keith also says that "the polling example would perhaps be better with the added wrinkle of 'those with cell phones are more likely to get contacted and likely differ in how they would vote ...'". Indeed, this is just the sort of complication that makes our work so challenging (and fun)!

My father recently announced, "I used to believe in statistics, but I don't anymore." His attitude is based on the canard you quoted: "You can prove anything with statistics." Instead of considering the source and credibility of statistical data, Dad has chosen to chuck the whole thing. Or at least claim that he has. Later discussions were salted with "data" derived from his favorite right-wing sources, but I am such a good son that I resisted throwing his own quotes back at him (but I am weakening!).

I'm fond of yelling whenever Peter Mansbridge ends an item with the statement that 67% of Canadians support blah-blah 19 times out of 20, "You mean 67% of those who own phones!" But I guess Peter is not too concerned about people who don't own phones.

I suspect the eventual name change will be as abrupt as the change from

natural philosophertoscientist. In the meantime, we're saddled withstatisticianand a host of variations and specialties: data miner, quality engineer, biostatistician, survey researcher, decision scientist, psychometrician, etc. At UT San Antonio we're actually trying to downplay the distinction between statisticans, applied mathematicians, economists, and financial analysts, lumping them together asquantitative scientists. (This is a trick to recruit high-schoolers who are good in math, but lack clear career goals.) Maybe we'll just call ourselvequants."Quants" sounds a bit like something out of

Brave New World!When did the change from

natural philosophertoscientisthappen? I remember reading somewhere that the wordscientistis actually quite new.Shakespeare claimed that "a rose by any other name would smell as sweet". But I think that names matter. When discussions of names come up, people often try to dismiss the issue saying "It's just semantics." It has just occurred to me that a good answer might be "Yes, it is a question of semantics -- the study of meanings -- and meanings matter."

Inevitably there's some overlap between different quantitative disciplines, and ultimately I suspect the boundaries are fuzzy. But I still think that it's useful to distinguish different fields.

Incidentally, it's interesting that we have -ology words (e.g. physiology) and -ics words (e.g. physics). And the history of those particular examples is interesting too (compare "physicist" and "physician").

## Post a Comment

<< Home