I thought I'd give ERL a chance to redeem itself. They didn't.
Article under review for Environmental Research Letters
Quantifying the consensus on anthropogenic global warming in the
literature: A re-analysis - Professor Dr Richard S J Tol
ID: ERL/482132/COM
REFEREE'S REPORT
============================
General Comments
In this Comment letter, the author presents a thorough
exploration of many facets of the recent Cook et al. ERL paper. Replication,
re-analysis, and investigation of claims made in the literature are some of the
hallmarks of science, and thus I welcomed the idea of this submission.
After reading it a number of times, however, I believe that the paper does not
contribute enough novelty to the discussion to move the field forward, has a
number of critical flaws, and thus should not be accepted, per the ERL stated
guidelines.
Rather than contribute to the discussion, the paper instead
seems oriented at casting doubt on the Cook paper, which is not appropriate to
a peer-reviewed venue, and has a number of important flaws. I outline the
largest of these below and then specific comments/feedback below that.
1) Claims not supported by data/results. Many of the
claims in the abstract and conclusion are not supported by the author’s
analyses. Much of the analyses explore methods choices made by the Cook et al paper, and
often find differences when a different database or search term or subset of
papers is analyzed, but the larger point is completely missed –
that both the Cook authors and this paper’s author make assumptions about
representativeness, appropriateness of search terms, appropriateness of
different fields in calculations made. These are, in fact, assumptions. Thus,
it is impossible to claim that the Cook dataset is “unrepresentative” of any
larger population, as the other scenarios investigated by the author are just a
different (and not necessarily better or “more true”, even in some cases less
likely to be a good sample) set of assumptions. Regarding later calculations
of consensus, the author finds largely similar percentages to that of the Cook
paper and also
seems to ignore the self-rated abstract consensus rate,
presenting evidence in fact that the Cook paper’s main conclusions do seem to
be quite robust, which is the opposite of what is claimed by the author.
2) A vast degree of relevant literature ignored. In order
to contribute to the discussion, the paper should consider other relevant
literature concerning the quantification of consensus, almost none of which is
cited in this paper.
The Cook paper is a larger version of the 2004 Oreskes Science
paper which found incredibly high consensus rates and stood the test of time.
Furthermore, if the Cook conclusions were wildly off, they would disagree with
the literature of direct polling of scientists. The Cook conclusions do not
disagree, however, and are almost exactly in line with Doran &
Kendall-Zimmerman 2009, Rosenberg et al 2010, Anderegg et al 2010. That these
papers present consensus rates between 94-98%, which is completely in line with
the Cook findings and even those presented by the author.
3) Language is overly polemical and not professional in
some areas. At times in the introduction and conclusion, the language used is
charged, combative, not appropriate of a peer-reviewed article and reads more
like a blog post.
This does not serve the paper well and reflects poorly on the
author.
4) Other Cook paper findings ignored. This paper does not
mention or discuss the author self-ratings presented in the Cook et al paper
whatsoever. These self-ratings, in fact, are among the strongest set of data
presented in the paper and almost exactly mirror the reported ratings from
the Cook author team.
Specific Comments
P1L15-19: Almost every single one of these abstract
claims is an oversimplification and often misleading characterization of what
is reported in the analyses of these comment.
P4L1-3: This analysis has no bearing on the Cook paper’s
findings.
Oversampling the top 50 papers will have very little to no effect on a
total population sample of >12,000 papers.
P4L7-9: Just because a database has more papers, does not
mean that it is a better representation of the “true” population of
relevant papers. In fact, Figure S4 shows quite clearly that Scopus includes a
large amount of disciplines that are certainly not relevant to the attribution
of climate change. It’s quite clear that one would, in fact, want to sample
earth and planetary sciences quite extensively and not sample “Medicine,”
“Material Science,” “Biochemistry/genetics,” or “Business Management and
Accounting”,
which are more heavily sampled in the Scopus database.
Thus, a strong case could be made that this comparison and analysis is quite
flawed, as the Web of Science search would seem to be much more relevant to the
question at hand, dropping out a large number of irrelevant disciplines to
this question. The Web of Science sample is therefore not over-sampling or
undersampling the true population, the analyses presented here indicate in fact
that it’s a better representation of the true population of relevant papers.
P4L10-15: This is a good point that should be pointed
out.
P4L16-17: This must be supported with citations in the
literature.
Otherwise, the statement is not justified. One could easily argue
that younger and more obscure journals are also much more highly prone to
bypassing or having flawed peer-review processes, as evidenced by the recent
explosion of pseudo-journals in recent years, leading to publishing flawed and
incorrect papers, and thus the Web of Science bar could be more informative.
P4L21: Unrepresentative of what? This is a fairly broad
and bold statement that is entirely unjustified by the data presented. Their
sample is simply a sample, subject to the various constraints and meta-data of
each database, which the author has explored and pointed out. Pointing out
these constraints and the potential for them to lead to different conclusions is
valuable. Blanket broad and normative statements like this are not. In fact, it
should be acknowledged that several of the figures in this section actually
give one *more* confidence in the appropriateness of Web of Science, rather
than less.
P5: These are useful analyses to include. The section
gives no evidence of bias in one direction or another, however, and the title
is not appropriate. In large part, these figures suggest mostly that it was
human beings doing the research and thus humans can get fatigued in such a large
and daunting research project. This can lead to lower quality data, but rarely
are data perfectly transcribed and meet all assumptions, especially when
humans are categorizing and qualitatively assessing literature, and there’s no
indication that this
biased the findings. In fact, all indications, such as
the self-rating by paper’s authors, suggest that the analysis was robust and
the findings not influenced by fatigue or low data quality.
P6L9-10: Sample size of 2-3 is not enough to say anything
meaningful.
P6L11-12: Again, a completely meaningless and biased
sample size. The Cook authors in fact present self-rating by paper authors and
arrive at 97.2% consensus by author self-ratings. Given that their response was
1,189 authors, this indicates that seven who disagree with their paper’s
ratings is essentially trivial and cherry-picked.
P7L9-14: This is largely true. In fact, if one accepts
this logic, it very much discounts much of the analysis presented earlier in
this comment using Scopus and several of the supplemental figures (e.g. 25
top-cited papers, 50 top-published authors, etc).
P7: These tables show quite clearly that in fact the
major conclusions of the Cook paper do stand (consensus rates of 95-99%), that
consensus is even higher in explicit endorsements, and this aligns with the
self-rated abstract findings presented in Cook, which are omitted from the
discussion in this comment.
P8L11-17: This is perhaps useful to point out, but not as
useful as the author presents. Science works by going after new findings and
exploring around the edges. Because the fingerprint of human impact on the
climate is well-established and the overwhelming evidence is clearly laid out in
recent IPCC reports, then of course the focus of research shifts to other
developing areas (such as how much it will warm based on paleoclimate studies).
P9L3-7: These statements are assumptions and
value-judgments by the author.
The author disagrees with how Cook et al. performed some of
the analyses in choice of database, search terms, etc., but just because the
author disagrees with the choices does not mean that the sample was
unrepresentative.
P9L12: No. In fact, the analysis in this paper shows quite
clearly that all of the major points in Cook et al do largely stand.
Furthermore, this comment omits other data presented by Cook et al. such as the
author self-rated consensus.
P9L8-17: This section is not supported by the data
presented and is also not professional and appropriate for a peer-reviewed
publication. Furthermore, aspersions of secrecy and holding data back seem
largely unjustified, as a quick google search reveals that much of the data is
available online (http://www.skepticalscience.com/tcp.php?t=home),
including interactive ways to replicate their research. This is far more open
and transparent than the vast majority of scientific papers published. In fact,
given how much of the paper’s findings were replicated and checked in the
analyses here, I would say the author has no grounds to cast aspersions of
data-hiding and secrecy.
Table 3: Why is self-rated consensus not reported in this
table as an additional column? It’s omission is glaring.
Article under review for Environmental Research Letters
Quantifying the consensus on anthropogenic global warming in the literature: A re-analysis - Professor Dr Richard S J Tol
ID: ERL/482132/COM
BOARD MEMBER'S REPORT
============================
The paper is a comment on Cook et al. (2013). The author
has in essence the following main criticisms of the paper.
1. Based on the (unsupported) claim that “Nowadays,
climate change means global climate change“, Tol suggests the search term
“climate change” would have been more appropriate for the survey, instead of
“global climate change”. While there is always a choice to be made which search
term one uses in such a survey, I think either choice would have been a valid
one and the pros and cons of each are debatable. Had Cook et al. used “climate
change”, they could have been criticised for casting a net that is too large
and could have found a host of papers dealing with local and regional issues.
The key issue is that any
publication documents which terms were actually used,
which was the case in Cook et al. Other authors are of course free to publish
their own survey findings using other search terms – which Tol does not do
here, despite calling his manuscript “a reanalysis” of the consensus. He does
not present his own analysis of the consensus but merely criticises how Cook
et al. conducted their survey.
2. Tol argues that using a different publication data
base (Scopus) instead of Web of Science might have given different results.
That may or may not be true – but is again a point that can only be addressed
if another group of scientists actually performs a similar study based on some
other publication data base. Cook et al. cannot be faulted for using Web of
Science; that is a perfectly valid choice. I generally prefer Web of Science
over Scopus in my own literature searches because I find Scopus includes
sources of dubious quality which not always conform to what I would call
peer-reviewed journal articles.
Had Cook et al. used Scopus, they could have been
criticised for that choice.
3. Tol discusses problems that can arise in the rating
process, e.g. rater fatigue. That certainly is a potential problem – human
raters are not objective algorithms, and in this kind of rating exercise a
certain subjective element in taking the decisions is as inevitable as it is
obvious. Tol presents no evidence that this is a large problem that would
significantly alter the results, though, to the contrary – the numbers he
presents suggest it is a small problem that would not significantly alter the
conclusion of an overwhelming consensus. Thus I think this rating subjectivity
is a caveat to mention when discussing and interpreting the results, but no
cause for a formal Comment. It remains unclear why Tol headed this section
“data errors”; the issue of subjective classification is not one of “data
errors”.
4. Tol makes an interesting point about how the consensus
rate depends on subject and argues that policy papers should have been rated as
“neutral”, rather than endorsing anthropogenic warming. Cook et al. rated those
as implicit endorsement of anthropogenic warming, since in its absence CO2
emissions reductions don’t make sense. I would agree with Cook et al. here. A more valid question is whether the views of a paper on
climate policy are interesting – perhaps not, perhaps one should only be
interested in the views of natural scientists on this question. This is a
matter of opinion, though, and certainly not one of “classification errors” as
Tol heads this section.
5. The final paragraph on “Trends” makes the same point
again – if one reclassified all adaptation and mitigation papers then the
results would be different. But I don’t think these papers were
incorrectly classified; it is merely a matter of opinion whether one would want
to include the authors of these papers in the expert community that one surveys, or
whether one finds their views to be irrelevant, as Tol apparently does.
Having a different opinion on this point is by itself not a reason to publish a
formal Comment.
In the final paragraph Tol writes: “There is no doubt in my mind that the literature on
climate change overwhelmingly supports the hypothesis that climate change is
caused by humans. I have very little reason to doubt that the consensus is
indeed correct.”
Indeed Tol provides no reason to question the main
conclusions of Cook et al. He merely provides his opinions on where he would have conducted this
survey differently and in his view better – and he is free to do just that. But
he has not identified serious methodological flaws in Cook et al. that would
justify the publication of a Comment.
Seventh
draft. Main changes: Inclusion of paper ratings; further tests of patterns in data.
Update on data (22 July): John Cook has now been asked (once, around July 7) by the director of the Global Change Institute, University of Queensland, and (three times, first around June 20) by the editor of Environmental Research Letters to release all of his data. I asked him 5 times now (first on May 31). Cook has released only a little bit more data: The author ratings. The actual data confirm what is in the paper: Paper ratings and abstract ratings strongly disagree with each other.
Sixth
draft
Rejection letter by ERL:
Article under review for Environmental Research Letters
Comment on: "Quantifying the consensus on anthropogenic global warming in the literature" - Professor Dr Richard S J Tol
ID: ERL/477057/COM
BOARD MEMBER'S REPORT
============================
The comment raises a number of issues with the recent study by Cook et al. It is written in a rather opinionated style, seen e.g. in the entire introductory section making political points, and in off-hand remarks like labelling Skeptical Science a “polemic blog” or in sweeping generalisations like the paper “may strengthen the belief that all is not well in climate research”.
It reads more like a blog post than a scientific comment.
The specification for ERL comments is:
“A Comment in Environmental Research Letters should make a real
contribution to the development of the subject, raising important issues about errors, controversial points or misleading results in work published in the journal recently.”
I do not think this manuscript satisfies those criteria. It is in a large part an opinion piece, in other parts it suggests better ways of analysing the published literature (e.g. using a larger database rather than just Web of Science). These are all valid points for the further discussion following the publication of a paper – colleagues will have different opinions on interpreting the results or on how this could have been done better, and it is perfectly valid to express these opinions and to go ahead and actually do the research better in order to advance the field.
I do not see that the submission has identified any clear errors in the Cook et al. paper that would call its conclusions into question – in fact he agrees that the consensus documented by Cook et al. exists. The author offers much speculation (e.g. about raters perhaps getting tired) which has no place in the scientific literature, he offers minor corrections – e.g. that the endorsement level should not be 98% but 97.6% if only explicit endorsements are counted. He spends much time on the issue of implicit endorsements, about which one can of course have different opinions, but the issue is clearly stated in the Cook et al. paper so this does not call for a published comment on the paper. He also offers an alternative interpretation of the trends – which is fine, it is always possible to interpret data differently.
All these things are valid issues for the usual discourse that exists in many informal avenues like conferences or blogs, but they do not constitute material for a formal comment.
The editor-in-chief has an interesting blog on the paper.
As submitted to Environmental Research Letters; data
Fourth draft, editing and references
Third draft, with proper tests for internal consistency
Second draft, with more validity tests and Scopus v Web of Science explained.
First draft of comment on Cook et al. paper in ERL 2013.
Add a comment