Dear
Professor Høj,
I
was struck by a recent paper published in Environmental
Research Letters with John Cook, a University of Queensland employee, as
the lead author. The paper purports to estimate the degree of agreement in the
literature on climate change. Consensus is not an argument, of course, but my
attention was drawn to the fact that the headline conclusion had no confidence interval,
that the main validity test was informal, and that the sample contained a very
large number of irrelevant papers while simultaneously omitting many relevant
papers.
My
interest piqued, I wrote to Mr Cook asking for the underlying data and received
13% of the data by return email. I immediately requested the remainder, but to
no avail.
I
found that the consensus rate in the data differs from that reported in the
paper. Further research showed that, contrary to what is said in the paper, the
main validity test in fact invalidates the data. And the sample of papers does
not represent the literature. That is, the main finding of the paper is
incorrect, invalid and unrepresentative.
I
asked Mr Cook again for the data so as to find a coherent explanation of what
is wrong with the paper. As that was unsuccessful, also after a plea to
Professor Ove Hoegh-Guldberg, the director of Mr Cook’s work place, I contacted
Professor Max Lu, deputy vice-chancellor for research, and Professor Daniel
Kammen, journal editor. Professors Lu and Kammen succeeded in convincing Mr
Cook to release first another 2% and later another 28% of the data.
I
also asked for the survey protocol but, violating all codes of practice, none
seems to exist. The paper and data do hint at what was really done. There is no
trace of a pre-test. Rating training was done during the first part of the
survey, rather than prior to the survey. The survey instrument was altered
during the survey, and abstracts were added. Scales were modified after the
survey was completed. All this introduced inhomogeneities into the data that
cannot be controlled for as they are undocumented.
The
later data release reveals that what the paper describes as measurement error (in either direction) is in fact
measurement bias (in one particular
direction). Furthermore, there is drift
in measurement over time. This makes a greater nonsense of the paper.
I
went back to Professor Lu once again, asking for the remaining 57% of the data.
Particularly, I asked for rater
IDs and time stamps. Both may help to understand what went wrong.
Only 24 people took the
survey. Of those, 12 quickly dropped out, so that the survey essentially relied
on just 12 people. The results would be substantially different if only one of
the 12 were biased in one way or the other. The paper does not report any test
for rater bias, an astonishing oversight by authors and referees. If rater IDs
are released, these tests can be done.
Because so few took the
survey, these few answered on average more than 4,000 questions. The paper is
silent on the average time taken to answer these questions and, more
importantly, on the minimum time. Experience has that interviewees find it
difficult to stay focused if a questionnaire is overly long. The questionnaire
used in this paper may have set a record for length, yet neither the authors
nor the referees thought it worthwhile to test for rater fatigue. If time
stamps are released, these tests can be done.
Mr Cook, backed by
Professor Hoegh-Guldberg and Lu, has blankly refused to release these data,
arguing that a data release would violate confidentiality. This reasoning is
bogus.
I don’t think
confidentiality is relevant. The paper presents the survey as a survey of
published abstracts, rather than as a survey of the raters. If these raters are
indeed neutral and competent, as claimed by the paper, then tying ratings to
raters would not reflect on the raters in any way.
If, on the other hand,
this was a survey of the raters’ beliefs and skills, rather than a survey of
the abstracts they rated, then Mr Cook is correct that their identity should
remain confidential. But this undermines the entire paper: It is no longer a
survey of the literature, but rather a survey of Mr Cook and his friends.
If need be, the association
of ratings to raters can readily be kept secret by means of a standard
confidentiality agreement. I have repeatedly stated that I am willing to sign
an agreement that I would not reveal the identity of the raters and that I
would not pass on the confidential data to a third party either on purpose or
by negligence.
I first contacted Mr
Cook on 31 May 2013, requesting data that should have been ready when the paper
was submitted for peer review on 18 January 2013. His foot-dragging, condoned
by senior university officials, does not reflect well on the University of
Queensland’s attitude towards replication and openness. His refusal to release
all data may indicate that more could be wrong with the paper.
Therefore, I hereby
request, once again, that you release rater IDs and time stamps.
Yours sincerely,
Richard Tol
Add a comment