Last year, Stephen T. Ziliak and Deirdre N. McCloskey published The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives. I finished it a while ago now (during my trip to Alaska, in fact; a lot of time available on those planes), and I want to discuss it here.
Before checking out my amateur opinions, however, it may be wise to check out some the reviews that Ziliak kindly has gathered on his homepage. I, for example, found the one from Science (by Theodore Porter) interesting. Only read the review in Journal of Economic Literature (by Saul Hymans) if you don’t plan to read the book itself. You may settle with the following, closing paragraph:
Despite my firm belief that most applied econometricians would benefit from adopting the methodological position presented by Ziliak and McCloskey and that economics as science would be improved significantly thereby, I can’t close without something of a rebuke. As often happens when someone is pushing what the mainstream considers an extreme or fringe position, the arguments become narrowly and harshly focused. This comes through too often in Ziliak and McCloskey. In its particularly narrow perspective, their treatment of the professional accomplishments of a number of exceptionally gifted economists is simply unjustified. Included among such economists are Gary Becker, Trygve Haavelmo, Harold Hotelling, Lawrence Klein, and Paul Samuelson. It is especially unfortunate, for example, that Ziliak and McCloskey misrepresent the significance of Haavelmo’s pathbreaking article of 1944 and never even mention his major contribution of 1947, a piece which Ziliak and McCloskey should find quite simpatico [p. 503].
The ‘review’ in the Journal of Economic Methodology (by Tom Engsted) is a full-length article discussing a debate surrounding the Ziliak & McCloskey book.
So, I hope you’re fed up already: Here are my (largely unqualified) opinions on and comments to this book. As (almost) always; first things first: The Cult of Statistical Significance carries some important messages (despite it’s terrible title; maybe they were inspired by the dreadful Andrew Keen). The Cult tells us that statistical significance is not the same thing as substantive significance, that statistical significance is often misused and appears to be misunderstood by many a scientist (and particularly economists), and that only by attending to quantitative, scientific magnitude and judgement will sciences like medicine (yes, medicine), economics, and other statistically confused fields be able to move ‘into the age of science and humanity’ (p. 251; all page references are to the paperback edition). Alright, maybe the last one there may be discussed (and the quote is slightly out of context); notwithstanding, Ziliak and McCloskey certainly feels that way, wants their readers to feel that way, and bring a lot of good arguments to the table.
A quick, non-technical update on statistical significance is found here, by the way. From the first paragraph:
In normal English, “significant” means important, while in Statistics “significant” means probably true (not due to chance). A research finding may be true without being important. When statisticians say a result is “highly significant” they [should] mean it is very probably true.
Everyone who understands statistical significance understands that substantial significance is something else and the more important of the two. The disagreement would be whether statistical significance is misused, misunderstood, or both. Perhaps, then, Ziliak and McCloskey’s crown argument is their study of the practice with statistical significance in the American Economic Review during the 1980s. (To those unaware; a publication in the AER is among the most prestigious and important things an economist can achieve, particularly in terms of their career.) Their findings, discussed in chapter 6 (pp. 74-78) and published in the Journal of Economic Literature in 1996, is discouraging. The best economists (that is, those publishing in the AER in the 1980s) misuse statistical significance to a large degree.
Next, Ziliak and McCloskey do something odd. Faced with arguments from colleagues that best practice had improved since the 1980s, perhaps partly because of their 1996 article, they go ahead and do another study of the practice in the AER in the 1990s:
We are very willing to believe that since the 1980s our colleagues have stopped making an elementary error and especially that we have changed their minds. But being readers of typical economics articles since that first decace of personal computers [the 1980s] we seriously doubted that fishing for significance had much abated. […] And so in a second article, published in 2004, we [reapplied our study from 1996] to all the full-length empirical articles of the next decade of the AER, the 1990s [p. 79].
There’s a logical flaw here. If Ziliak and McCloskey caused the change with their 1996 article, it won’t show up in a study of the 1990s! And probably not of the first decade of the new millenium either; changes take a while, often a generation or so (just ask Thomas Kuhn, rephrased by Paul A. Samuelson in 1999: ‘Science advances funeral by funeral’). And indeed, Ziliak and McCloskey(or, McCloskey and Ziliak, as the reference will show) doesn’t find much of an improvement in the 1990s compared to the 1980s.
Of course, Ziliak and McCloskey have ideas on why scientific practice in the AER has not changed:
Significance unfortunately is a useful means toward personal ends in the advance of science – status and widely distributed publications, a big laboratory, a staff or research assistants, a reduction in teaching load, a better salary, the finer wines of Bordeaux. Precision, knowledge, and control. In a narrow and cynical sense statisitcal significance is the way to achieve these. Design experiment. Then calculate statistical significance. Publish articles showing “significant” results. Enjoy promotion.
But it is not science, and it will not last [p. 32].
They may be right. But, one cannot forget the position of the AER among economists. Because of it’s career generating potential, economists are likely to mimic it, both methodologically and rhetorically.
Still, Ziliak and McCloskey’s empirical evidence of the poor econometric practice in the AER is striking and convincing. I am now skeptic towards empirical economists. And not only economists: Ziliak and McCloskey summarizes evidence of bad practice in psychology, medicine, ecology and several other fields. One gets the impression that only the ‘hard’ sciences got it right (and I who used to think that medicine was a hard science).
In Ziliak and McCloskey’s view, one man is responsible for most of the statistical mess in economics and other fields: R. A. Fisher, whose Statistical Methods for Research Workers (1925), which went through no less than 14 editions, laid down the foundations for much of the later, statistical practice in many applied statistical fields. Ziliak and McCloskey attacks Fisher hard, in an almost distasteful way (they even accuse him of ‘outright, scientific fraud’ in an endnote somewhere, I’m sure, but I wasn’t able to find back to it), and large parts of the second half of the book they devote to attack Fisher in various ways and contrast him to (their hero, it appears) William Sealy Gosset, better know as ‘Student’ (look up Student-t). Maybe even more absurd is the blame put on the philosophical trends of the time:
One reason for the success of the Fisherian program against more logical alternatives […] is that the Fisherian program emerged just as neopositivismand then falsificationism emerged in the philosophy of science. It would have fallen flat in philosophically more subtle times, such as those of Mill’s System of Logic Ratiocinative and Inductive (1843) or Imre Lakatos’s Proofs and Refutations (1976). No serious philosopher nowadays is a positivist, no seroius philosopher of science a simple falsificationist. But the philosophical atmosphere of 1922-62 was perfect for the fastening of Fisher’s grip on the [statistically confused] sciences [p. 149].
Toward the end, after the very interesting chapter 23 (pp. 238-244), Ziliak and McCloskey get almost out of hand. On pages 249-250, they propose a “Statement on the proprieties of Substantive Significance” which they want editors, administrators and scientists to sign. The language of their ‘statement’ is, however, to involved and won’t hold as a standalone statement. I don’t understand the purpose of the statement when it isn’t self-containted, and otherwise just repeats a message that has been pounded upon throughout the book. And they keep pounding it, increasing their volume:
The textbooks are wrong. The teaching is wrong. The seminar you just attended is wrong. The most prestigious journal in your scientific field is wrong [p. 250].
I had high expectations for this book, particularly because McCloskey’s name was on the cover. I was sligthly disappointed, however. By close examination, I discovered that McCloskey’s name is put last and that they’ve ignored the alphabetical order of names: Ziliak is the main author. It is obvious in some places, Ziliak don’t have McCloskey’s Economical Writing under his skin (neither do I, of course, but I would expect McCloskey to). This is a minor issue, certainly, but I looked forward to some persuasive, well-written, and witful prose of the kind McCloskey promotes in her Economical Writing; most of the time, it didn’t happen.
UPDATE: I have a hundred things to say about this book, but I cannot say them all at once. My above review is bad, I know, and I apologize. I confused the important things I wanted to say with the unimportant ones. I may get back to the book in later posts, but for now, I suggest the reader rather reads my discussion of the debate between Spanos and Ziliak & McCloskey. It presumable gives a better idea of what’s important in the book and yields more enjoyable and interesting reading.
And, for the record, let me again point out that I agree with Ziliak & McCloskey on their main point in The Cult, as stated in their reply to Spanos:
Statistical significance is neither necessary nor sufficient for substantive scientific significance.