1. I welcome the inquiry by the Select Committee into the Intergovernmental Panel on Climate Change. The focus of the inquiry is on Working Group I of the IPCC and its Fifth Assessment Report, neither of which are in my core areas of experience and expertise. I was a contributing author to IPCC WG1 AR3; I was a lead author in a few reports of WG2 and WG3; I am currently a convening lead author for WG2 AR5. I will therefore address only a few of the issues raised by the Select Committee.

    ·  How effective is AR5 and the summary for policymakers in conveying  what is meant by uncertainty in scientific terms ? Would a focus on risk rather than uncertainty be useful?

    The agreed distinction between risk and uncertainty goes back to Knight (1921), with risk characterized by known probabilities (the throw of a dice) and uncertainty by unknown probabilities. Climate change is better described by uncertainty than by risk. In other arenas the IPCC has tried to redefine widely accepted concepts (e.g., vulnerability) which has led to endless, fruitless discussions on semantics. It would be regrettable if the IPCC would repeat this mistake with regard to risk and uncertainty.

    ·  Do the AR5 Physical Science Basis report’s conclusions strengthen or weaken the economic case for action to prevent dangerous climate change?

    IPCC WG1 AR5 is silent on this matter. The IPCC cannot make a case for action without violating its mandate; and if anything, such a case would follow from an assessment of the material in the reports of all three working groups. The IPCC cannot assess whether climate change is dangerous or not, because danger is a value-laden concept that, per Arrow (1962), cannot be defined for a society.

    ·  What implications do the IPCC’s conclusions in the AR5 Physical Science Basis report have for policy making both nationally and internationally?

    None. IPCC WG1 AR5 has added little to AR4 that would shift the established positions on climate policy, either nationally or internationally.

    ·  Is the IPCC process an effective mechanism for assessing scientific knowledge? Or has it focussed on providing a justification for political commitment?

    Neither. The IPCC process assesses scientific knowledge according to a political time-scale. That implies that parts of the literature are assessed too frequently while other parts of the literature are not assessed frequently enough. Instead of a mega-report every 6-7 years, it would be better to have an IPCC Journal with frequent updates where the literature moves fast and infrequent updates where little new is written.

    Political positions are driven by power relations and the views of the electorate. The typical voter does not read the IPCC reports, but only casts a glance at what some journalist made of the IPCC press release.

    The IPCC reports do justify the existence of a large bureaucracy which, judging from the lack in progress in reducing greenhouse gas emissions and vulnerability to climate change, seems primarily occupied with maintaining and expanding said bureaucracy.

    ·  Is the rate at which the UK Government intends to cut CO2 emissions appropriate in light of the findings of the IPCC AR5 Physical Science Basis report?

    Per Weitzman (1972), the UK should set an appropriate trajectory for a carbon price, rather than for greenhouse gas emissions. If the UK chooses to persist in its mistake of emissions targets, it should inform that decision with an assessment of the reports of all three working groups, and particularly WG3.

    ·  What relevance do the IPCC’s conclusions have in respect of the review of the fourth Carbon Budget?
    None. At a stretch, IPCC WG1 AR5 may have something to say about a long-term global carbon budget. However, a decade of British emissions is very small relative to a century of global emissions.

    The UK could be a leader in international climate policy if it would demonstrate that greenhouse gas emissions can be cut substantially without causing economic pain. Current UK climate policy shows the opposite: Climate policy can cause real hardship without making a dent in emissions.
    0

    Add a comment


  2. The graphs below show the 500-abstract rolling mean, standard deviation, skewness, and first-order autocorrelation for the initial ratings of the Consensus Project. I bootstrapped the data 10,000 times* to find the mean and the 95% confidence interval. The empirical statistics should lie above the upper bound in 2.5% of all cases, and fall below the lower bound by 2.5%.

    UPDATE1: 100-abstract rolling statistics added. UPDATE2: 50-year rolling abstracts added. UPDATE3: Cluster results added.

    500
    The mean exceeds the upper bound in 13.0% of cases, and the lower bound by 10.7%. The standard deviation exceeds the upper bound by 7.6%, and the lower bound by 12.2%. The skewness exceeds the upper bound by 3.3%, and the lower bound by 8.0%. The autocorrelation exceeds the upper bound by 25.5% and the lower bound by 0.0%.



    100
    The mean exceeds the upper bound in 6.9% of cases, and the lower bound by 7.8%. The standard deviation exceeds the upper bound by 4.7%, and the lower bound by 5.6%. The skewness exceeds the upper bound by 2.6%, and the lower bound by 6.4%. The autocorrelation exceeds the upper bound by 7.6% and the lower bound by 1.4%.


    50
    The mean exceeds the upper bound in 4.9% of cases, and the lower bound by 6.4%. The standard deviation exceeds the upper bound by 4.0%, and the lower bound by 5.0%. The skewness exceeds the upper bound by 2.7%, and the lower bound by 5.3%. The autocorrelation exceeds the upper bound by 6.1% and the lower bound by 1.5%.
    Clustering
    Rolling statistics are not intuitive to everyone. I therefore computed three additional statistics: The minimum distance between identical ratings, the maximum distance and the average distance. I again bootstrapped the data 10,000 times to compute the 90% confidence interval. The results are shown below. Because these are single statistics, we cannot compare the empirical exceedence frequencies with the theoretical ones. The graphs show that the sixes are placed further apart than you would expect by chance, while the sevens are too close together.

    Average
    Minimum
    Maximum


    Thanks to Brandon Shollenberger for pointing out that I initially had somehow messed up the data.

    *I know, I know. I could have invoked stationarity and saved my computer lots of runs. Brute force is easier to explain to the non-initiated, though.
    0

    Add a comment

  3. Dear Professor Høj,

    I was struck by a recent paper published in Environmental Research Letters with John Cook, a University of Queensland employee, as the lead author. The paper purports to estimate the degree of agreement in the literature on climate change. Consensus is not an argument, of course, but my attention was drawn to the fact that the headline conclusion had no confidence interval, that the main validity test was informal, and that the sample contained a very large number of irrelevant papers while simultaneously omitting many relevant papers.

    My interest piqued, I wrote to Mr Cook asking for the underlying data and received 13% of the data by return email. I immediately requested the remainder, but to no avail.

    I found that the consensus rate in the data differs from that reported in the paper. Further research showed that, contrary to what is said in the paper, the main validity test in fact invalidates the data. And the sample of papers does not represent the literature. That is, the main finding of the paper is incorrect, invalid and unrepresentative.

    Furthermore, the data showed patterns that cannot be explained by either the data gathering process as described in the paper or by chance. This is documented at https://docs.google.com/file/d/0Bz17rNCpfuDNRllTUWlzb0ZJSm8/edit?usp=sharing

    I asked Mr Cook again for the data so as to find a coherent explanation of what is wrong with the paper. As that was unsuccessful, also after a plea to Professor Ove Hoegh-Guldberg, the director of Mr Cook’s work place, I contacted Professor Max Lu, deputy vice-chancellor for research, and Professor Daniel Kammen, journal editor. Professors Lu and Kammen succeeded in convincing Mr Cook to release first another 2% and later another 28% of the data.

    I also asked for the survey protocol but, violating all codes of practice, none seems to exist. The paper and data do hint at what was really done. There is no trace of a pre-test. Rating training was done during the first part of the survey, rather than prior to the survey. The survey instrument was altered during the survey, and abstracts were added. Scales were modified after the survey was completed. All this introduced inhomogeneities into the data that cannot be controlled for as they are undocumented.

    The later data release reveals that what the paper describes as measurement error (in either direction) is in fact measurement bias (in one particular direction). Furthermore, there is drift in measurement over time. This makes a greater nonsense of the paper.


    I went back to Professor Lu once again, asking for the remaining 57% of the data. Particularly, I asked for rater IDs and time stamps. Both may help to understand what went wrong.

    Only 24 people took the survey. Of those, 12 quickly dropped out, so that the survey essentially relied on just 12 people. The results would be substantially different if only one of the 12 were biased in one way or the other. The paper does not report any test for rater bias, an astonishing oversight by authors and referees. If rater IDs are released, these tests can be done.

    Because so few took the survey, these few answered on average more than 4,000 questions. The paper is silent on the average time taken to answer these questions and, more importantly, on the minimum time. Experience has that interviewees find it difficult to stay focused if a questionnaire is overly long. The questionnaire used in this paper may have set a record for length, yet neither the authors nor the referees thought it worthwhile to test for rater fatigue. If time stamps are released, these tests can be done.

    Mr Cook, backed by Professor Hoegh-Guldberg and Lu, has blankly refused to release these data, arguing that a data release would violate confidentiality. This reasoning is bogus.

    I don’t think confidentiality is relevant. The paper presents the survey as a survey of published abstracts, rather than as a survey of the raters. If these raters are indeed neutral and competent, as claimed by the paper, then tying ratings to raters would not reflect on the raters in any way.

    If, on the other hand, this was a survey of the raters’ beliefs and skills, rather than a survey of the abstracts they rated, then Mr Cook is correct that their identity should remain confidential. But this undermines the entire paper: It is no longer a survey of the literature, but rather a survey of Mr Cook and his friends.

    If need be, the association of ratings to raters can readily be kept secret by means of a standard confidentiality agreement. I have repeatedly stated that I am willing to sign an agreement that I would not reveal the identity of the raters and that I would not pass on the confidential data to a third party either on purpose or by negligence.

    I first contacted Mr Cook on 31 May 2013, requesting data that should have been ready when the paper was submitted for peer review on 18 January 2013. His foot-dragging, condoned by senior university officials, does not reflect well on the University of Queensland’s attitude towards replication and openness. His refusal to release all data may indicate that more could be wrong with the paper.

    Therefore, I hereby request, once again, that you release rater IDs and time stamps.

    Yours sincerely,




    Richard Tol
    0

    Add a comment

  4. According to Cook et al., abstracts were presented in random order to the raters. The figure below shows the distribution of the year of publication (older papers at the bottom, newer ones at the top) in sequence of rating, for blocks of 1000 ratings (early ratings to the left, later ratings to the right). The pattern is indeed uniform, except for a lot of recent papers at the end of the process.
    The figure below shows the distribution of the distance between the first and second rating (distance between ratings on the horizontal axis, bins of 1000 500; number rating distances on the vertical axis). The distribution is as expected, except for a bunch of abstracts that were rated in close succession. This is consistent with the figure above.

    UPDATE: The figure below shows, in bins of 500, when the first rating (red), the second rating (blue) and the third rating (green) took place. Fourth and fifth ratings make up the difference (except for the final bin; there are 26848 ratings). Clearly, the ratings took place in sequence. This has implications for the results, as the population of raters changed over time, and the ratings were subject to discussion and reinterpretation in the beginning of the rating process.

    0

    Add a comment

  5. According to Cook et al., each abstract was assessed by at least 2 and at most 3 raters. In fact, 33 abstracts were seen by only one rater, 167 by four raters, and 5 by five. If the initial ratings disagreed, as they did in 33% of cases, abstracts were revisited by the original raters. In 15.9% of cases, this led to agreement. In 17.1% of cases, a third rater broke the tie.

    A reported error rate of 33%, with 2 ratings and 7 categories, implies that 18.5% of ratings were incorrect. 0.6% of abstracts received two identical but wrong ratings. 2.9% of ratings are still wrong after reconciliation. 3.2% of ratings are wrong after re-rating. In total, 6.7% of reported data are in error.

    The figure shows that the corrections to the ratings through reconciliation and re-rating were, on balance, towards rejection of the hypothesis of human-made climate change. In other words, these were not errors in either direction, but rather biases in one direction.

    Assuming that the 6.7% of erroneous data should be corrected in the same way, the consensus rate falls from 98% to 91%.


    1

    View comments

  6. Mr Cook has released more data. Unfortunately, a lot of data is still missing. Particularly, rater IDs and time stamps are not available. This means that we cannot test for systematic differences between raters. It also means that we cannot compute average and minimum rating times.

    Dan Kammen, the editor of Environmental Research Letters, has explicitly endorsed Cook's refusal to release all data, against the journal policy. No word yet from the University of Queensland or the Institute of Physics.

    Three new data sets were released. Ratings 4a and b are now available. This reveals that they had another look at 1000 papers, re-rated 5, and scaled this up to 40 for the entire sample. That's fine.

    Paper IDs were released too.

    Most importantly, ratings are now there. The data are in the order of rating. The data have the paper ID, the rater's rate, the rater's topic, the final rate and the final topic. The data are organized such that I cannot do much with the data with the software on my laptop. A full analysis will have to wait. A number of things are striking, however (and explain why the data resists analysis). The paper says that each abstract was rated at least twice and at most thrice. In fact, some abstracts were rated only once. Other abstracts were rated four or five times, which implies that there are discrepancies between the supposedly final ratings.

    For the final 3196 ratings (some 1500 abstracts), there are no original ratings at all -- only final ratings, with discrepancies.

    In my previous comment, I examine the data -- ordered by year and title -- for inexplicable patterns. Some have argued that ordering by year and title induces the patterns found. I am not convinced this is true. Be it as it may, if we take the paper at its word, abstracts were presented in random order to the raters. Assuming that that is true (I have yet to check) characteristics of the abstracts cannot induce a pattern in the newly released data. Even if there is a trend towards greater endorsement over time (a claim that the paper makes but fails to support), that trend would be destroyed by randomization. If there is autocorrelation induced by title (a rather bizarre suggestion by some commentators), that would be removed by randomization.

    So, there should be no pattern in the new data. The next three graphs show, in the top panel, the 50-abstract rolling mean rate (in blue), the 500-abstract rolling mean rate (in red) and the trend (in black); in the middle panel the standard deviation; and in the bottom panel the skew. Is that a pattern I see?




    9

    View comments

  7. Google graced me with computing power stronger than a 80286 so I recovered all data from the Tol Poll. See initial discussion.

    There were 2603 valid answers and 12531 answers from bots. We now know there were two bots, one green and one brown.

    Update of results
    Some people have better name recognition than others.

    Some people are more popular than others.
    Some people are popular with some, but less so with others.
    Some people are popular across the climate policy spectrum, others are not.
    New results
    The order in which questions are asked is always an issue. On August 3 & 4, Judy and Tamsin swapped position. On August 4, Dana and Steve swapped position.

    This statistically significantly affected the opinions about Judy, Steve and Tamsin, but not about Dana, Gavin and Joe. Tamsin shows the strongest effect: People think differently of her when she comes after Dana than when she comes after James.




    All valid data, and aggregate data for the bots, can be found here.




    0

    Add a comment

  8. The Tol Poll is a direct result of the series of op-eds in the Guardian on the relationship between the environmental movement and environmental science organized by Alice Bell, and particularly Tamsin Edwards’ call for experts to talk about their area of expertise only. In the ensuing discussion, many noted just how nasty the climate debate has become, and Chairman Al, the Climate Chimp, suggested a poll on nastiness.

    So I did, as a joke. Putting together an internet poll is trivial. (Designing a good poll is a lot of work.)

    The poll itself is simple. Rate 12 people who are prominent in the British climate debate online, on a scale of 1 to 5 where 1 stands for “very nasty” and 5 stands for “very friendly”. There is a bonus question that places the respondent in the political spectrum, rating themselves on a scale of 1 to 5 where 1 stands for “very worried about the impacts of climate policy” and 5 for “very worried about the impacts of climate change”. (Some people argued these are different things, which of course they are, but I was not after identifying the agony aunts who worry about everything.)

    The expected result: Some people are either loved or loathed, depending on the (in)congruence of their political position and that of the respondent, whereas other people are accepted by both sides of the debate.

    The best I hoped for were some giggles, and perhaps a data set that could be used for a class in forensic statistics (as the framing of the poll invites dishonest answers).

    I had not counted on Anthony Watts pushing the poll. I had not counted on someone writing a bot to flood the poll with fake results pushing a particular position, and someone else writing a bot to support the opposite position. Or maybe it was the same bot, as its author realized people saw through the ruse. The software I used, Google Docs, is not really suited for handling this amount of data.

    As a courtesy to all those who took the time to fill out the poll and who discussed it (in grave, jocular or puzzled tones), here are some of the results.
    0

    Add a comment



  9. I thought I'd give ERL a chance to redeem itself. They didn't.

    Article under review for Environmental Research Letters Quantifying the consensus on anthropogenic global warming in the
     literature: A re-analysis - Professor Dr Richard S J Tol
    ID: ERL/482132/COM

    REFEREE'S REPORT

    ============================

    General Comments

    In this Comment letter, the author presents a thorough exploration of many facets of the recent Cook et al. ERL paper. Replication, re-analysis, and investigation of claims made in the literature are some of the hallmarks of science, and thus I welcomed the idea of this submission. After reading it a number of times, however, I believe that the paper does not contribute enough novelty to the discussion to move the field forward, has a number of critical flaws, and thus should not be accepted, per the ERL stated guidelines.

    Rather than contribute to the discussion, the paper instead seems oriented at casting doubt on the Cook paper, which is not appropriate to a peer-reviewed venue, and has a number of important flaws. I outline the largest of these below and then specific comments/feedback below that.



    1) Claims not supported by data/results. Many of the claims in the abstract and conclusion are not supported by the author’s analyses. Much of the analyses explore methods choices made by the Cook et al paper, and often find differences when a different database or search term or subset of papers is analyzed, but the larger point is completely missed – that both the Cook authors and this paper’s author make assumptions about representativeness, appropriateness of search terms, appropriateness of different fields in calculations made. These are, in fact, assumptions. Thus, it is impossible to claim that the Cook dataset is “unrepresentative” of any larger population, as the other scenarios investigated by the author are just a different (and not necessarily better or “more true”, even in some cases less likely to be a good sample) set of assumptions. Regarding later calculations of consensus, the author finds largely similar percentages to that of the Cook paper and also

    seems to ignore the self-rated abstract consensus rate, presenting evidence in fact that the Cook paper’s main conclusions do seem to be quite robust, which is the opposite of what is claimed by the author.



    2) A vast degree of relevant literature ignored. In order to contribute to the discussion, the paper should consider other relevant literature concerning the quantification of consensus, almost none of which is cited in this paper.

    The Cook paper is a larger version of the 2004 Oreskes Science paper which found incredibly high consensus rates and stood the test of time. Furthermore, if the Cook conclusions were wildly off, they would disagree with the literature of direct polling of scientists. The Cook conclusions do not disagree, however, and are almost exactly in line with Doran & Kendall-Zimmerman 2009, Rosenberg et al 2010, Anderegg et al 2010. That these papers present consensus rates between 94-98%, which is completely in line with the Cook findings and even those presented by the author.


    3) Language is overly polemical and not professional in some areas. At times in the introduction and conclusion, the language used is charged, combative, not appropriate of a peer-reviewed article and reads more like a blog post.

    This does not serve the paper well and reflects poorly on the author.


    4) Other Cook paper findings ignored. This paper does not mention or discuss the author self-ratings presented in the Cook et al paper whatsoever. These self-ratings, in fact, are among the strongest set of data presented in the paper and almost exactly mirror the reported ratings from the Cook author team.



    Specific Comments

    P1L15-19: Almost every single one of these abstract claims is an oversimplification and often misleading characterization of what is reported in the analyses of these comment.



    P4L1-3: This analysis has no bearing on the Cook paper’s findings.

    Oversampling the top 50 papers will have very little to no effect on a total population sample of >12,000 papers.



    P4L7-9: Just because a database has more papers, does not mean that it is a better representation of the “true” population of relevant papers. In fact, Figure S4 shows quite clearly that Scopus includes a large amount of disciplines that are certainly not relevant to the attribution of climate change. It’s quite clear that one would, in fact, want to sample earth and planetary sciences quite extensively and not sample “Medicine,” “Material Science,” “Biochemistry/genetics,” or “Business Management and Accounting”,

    which are more heavily sampled in the Scopus database. Thus, a strong case could be made that this comparison and analysis is quite flawed, as the Web of Science search would seem to be much more relevant to the question at hand, dropping out a large number of irrelevant disciplines to this question. The Web of Science sample is therefore not over-sampling or undersampling the true population, the analyses presented here indicate in fact that it’s a better representation of the true population of relevant papers.



    P4L10-15: This is a good point that should be pointed out.



    P4L16-17: This must be supported with citations in the literature.

    Otherwise, the statement is not justified. One could easily argue that younger and more obscure journals are also much more highly prone to bypassing or having flawed peer-review processes, as evidenced by the recent explosion of pseudo-journals in recent years, leading to publishing flawed and incorrect papers, and thus the Web of Science bar could be more informative.



    P4L21: Unrepresentative of what? This is a fairly broad and bold statement that is entirely unjustified by the data presented. Their sample is simply a sample, subject to the various constraints and meta-data of each database, which the author has explored and pointed out. Pointing out these constraints and the potential for them to lead to different conclusions is valuable. Blanket broad and normative statements like this are not. In fact, it should be acknowledged that several of the figures in this section actually give one *more* confidence in the appropriateness of Web of Science, rather than less.



    P5: These are useful analyses to include. The section gives no evidence of bias in one direction or another, however, and the title is not appropriate. In large part, these figures suggest mostly that it was human beings doing the research and thus humans can get fatigued in such a large and daunting research project. This can lead to lower quality data, but rarely are data perfectly transcribed and meet all assumptions, especially when humans are categorizing and qualitatively assessing literature, and there’s no indication that this

    biased the findings. In fact, all indications, such as the self-rating by paper’s authors, suggest that the analysis was robust and the findings not influenced by fatigue or low data quality.



    P6L9-10: Sample size of 2-3 is not enough to say anything meaningful.



    P6L11-12: Again, a completely meaningless and biased sample size. The Cook authors in fact present self-rating by paper authors and arrive at 97.2% consensus by author self-ratings. Given that their response was 1,189 authors, this indicates that seven who disagree with their paper’s ratings is essentially trivial and cherry-picked.



    P7L9-14: This is largely true. In fact, if one accepts this logic, it very much discounts much of the analysis presented earlier in this comment using Scopus and several of the supplemental figures (e.g. 25 top-cited papers, 50 top-published authors, etc).



    P7: These tables show quite clearly that in fact the major conclusions of the Cook paper do stand (consensus rates of 95-99%), that consensus is even higher in explicit endorsements, and this aligns with the self-rated abstract findings presented in Cook, which are omitted from the discussion in this comment.



    P8L11-17: This is perhaps useful to point out, but not as useful as the author presents. Science works by going after new findings and exploring around the edges. Because the fingerprint of human impact on the climate is well-established and the overwhelming evidence is clearly laid out in recent IPCC reports, then of course the focus of research shifts to other developing areas (such as how much it will warm based on paleoclimate studies).



    P9L3-7: These statements are assumptions and value-judgments by the author.

    The author disagrees with how Cook et al. performed some of the analyses in choice of database, search terms, etc., but just because the author disagrees with the choices does not mean that the sample was unrepresentative.



    P9L12: No. In fact, the analysis in this paper shows quite clearly that all of the major points in Cook et al do largely stand. Furthermore, this comment omits other data presented by Cook et al. such as the author self-rated consensus.



    P9L8-17: This section is not supported by the data presented and is also not professional and appropriate for a peer-reviewed publication. Furthermore, aspersions of secrecy and holding data back seem largely unjustified, as a quick google search reveals that much of the data is available online (http://www.skepticalscience.com/tcp.php?t=home), including interactive ways to replicate their research. This is far more open and transparent than the vast majority of scientific papers published. In fact, given how much of the paper’s findings were replicated and checked in the analyses here, I would say the author has no grounds to cast aspersions of data-hiding and secrecy.



    Table 3: Why is self-rated consensus not reported in this table as an additional column? It’s omission is glaring.





    Article under review for Environmental Research Letters Quantifying the consensus on anthropogenic global warming in the literature: A re-analysis - Professor Dr Richard S J Tol

    ID: ERL/482132/COM

    BOARD MEMBER'S REPORT

    ============================

    The paper is a comment on Cook et al. (2013). The author has in essence the following main criticisms of the paper.



    1. Based on the (unsupported) claim that “Nowadays, climate change means global climate change“, Tol suggests the search term “climate change” would have been more appropriate for the survey, instead of “global climate change”. While there is always a choice to be made which search term one uses in such a survey, I think either choice would have been a valid one and the pros and cons of each are debatable. Had Cook et al. used “climate change”, they could have been criticised for casting a net that is too large and could have found a host of papers dealing with local and regional issues. The key issue is that any

    publication documents which terms were actually used, which was the case in Cook et al. Other authors are of course free to publish their own survey findings using other search terms – which Tol does not do here, despite calling his manuscript “a reanalysis” of the consensus. He does not present his own analysis of the consensus but merely criticises how Cook et al. conducted their survey.



    2. Tol argues that using a different publication data base (Scopus) instead of Web of Science might have given different results. That may or may not be true – but is again a point that can only be addressed if another group of scientists actually performs a similar study based on some other publication data base. Cook et al. cannot be faulted for using Web of Science; that is a perfectly valid choice. I generally prefer Web of Science over Scopus in my own literature searches because I find Scopus includes sources of dubious quality which not always conform to what I would call peer-reviewed journal articles.

    Had Cook et al. used Scopus, they could have been criticised for that choice.



    3. Tol discusses problems that can arise in the rating process, e.g. rater fatigue. That certainly is a potential problem – human raters are not objective algorithms, and in this kind of rating exercise a certain subjective element in taking the decisions is as inevitable as it is obvious. Tol presents no evidence that this is a large problem that would significantly alter the results, though, to the contrary – the numbers he presents suggest it is a small problem that would not significantly alter the conclusion of an overwhelming consensus. Thus I think this rating subjectivity is a caveat to mention when discussing and interpreting the results, but no cause for a formal Comment. It remains unclear why Tol headed this section “data errors”; the issue of subjective classification is not one of “data errors”.



    4. Tol makes an interesting point about how the consensus rate depends on subject and argues that policy papers should have been rated as “neutral”, rather than endorsing anthropogenic warming. Cook et al. rated those as implicit endorsement of anthropogenic warming, since in its absence CO2 emissions reductions don’t make sense. I would agree with Cook et al. here. A more valid question is whether the views of a paper on climate policy are interesting – perhaps not, perhaps one should only be interested in the views of natural scientists on this question. This is a matter of opinion, though, and certainly not one of “classification errors” as Tol heads this section.



    5. The final paragraph on “Trends” makes the same point again – if one reclassified all adaptation and mitigation papers then the results would be different. But I don’t think these papers were incorrectly classified; it is merely a matter of opinion whether one would want to include the authors of these papers in the expert community that one surveys, or whether one finds their views to be irrelevant, as Tol apparently does. Having a different opinion on this point is by itself not a reason to publish a formal Comment.



    In the final paragraph Tol writes: “There is no doubt in my mind that the literature on climate change overwhelmingly supports the hypothesis that climate change is caused by humans. I have very little reason to doubt that the consensus is indeed correct.”



    Indeed Tol provides no reason to question the main conclusions of Cook et al. He merely provides his opinions on where he would have conducted this survey differently and in his view better – and he is free to do just that. But he has not identified serious methodological flaws in Cook et al. that would justify the publication of a Comment.

     
    Seventh draft. Main changes: Inclusion of paper ratings; further tests of patterns in data.

    Update on data (22 July): John Cook has now been asked (once, around July 7) by the director of the Global Change Institute, University of Queensland, and (three times, first around June 20) by the editor of Environmental Research Letters to release all of his data. I asked him 5 times now (first on May 31). Cook has released only a little bit more data: The author ratings. The actual data confirm what is in the paper: Paper ratings and abstract ratings strongly disagree with each other.

    Sixth draft

    Rejection letter by ERL:
    Article under review for Environmental Research Letters
    Comment on: "Quantifying the consensus on anthropogenic global warming in the literature" - Professor Dr Richard S J Tol
    ID: ERL/477057/COM
    BOARD MEMBER'S REPORT
    ============================
    The comment raises a number of issues with the recent study by Cook et al. It is written in a rather opinionated style, seen e.g. in the entire introductory section making political points, and in off-hand remarks like labelling Skeptical Science a “polemic blog” or in sweeping generalisations like the paper “may strengthen the belief that all is not well in climate research”.
    It reads more like a blog post than a scientific comment.

    The specification for ERL comments is:
    “A Comment in Environmental Research Letters should make a real
    contribution to the development of the subject, raising important issues about errors, controversial points or misleading results in work published in the journal recently.”

    I do not think this manuscript satisfies those criteria. It is in a large part an opinion piece, in other parts it suggests better ways of analysing the published literature (e.g. using a larger database rather than just Web of Science). These are all valid points for the further discussion following the publication of a paper – colleagues will have different opinions on interpreting the results or on how this could have been done better, and it is perfectly valid to express these opinions and to go ahead and actually do the research better in order to advance the field.

    I do not see that the submission has identified any clear errors in the Cook et al. paper that would call its conclusions into question – in fact he agrees that the consensus documented by Cook et al. exists. The author offers much speculation (e.g. about raters perhaps getting tired) which has no place in the scientific literature, he offers minor corrections – e.g. that the endorsement level should not be 98% but 97.6% if only explicit endorsements are counted. He spends much time on the issue of implicit endorsements, about which one can of course have different opinions, but the issue is clearly stated in the Cook et al. paper so this does not call for a published comment on the paper. He also offers an alternative interpretation of the trends – which is fine, it is always possible to interpret data differently.

    All these things are valid issues for the usual discourse that exists in many informal avenues like conferences or blogs, but they do not constitute material for a formal comment.

    The editor-in-chief has an interesting blog on the paper.

    As submitted to Environmental Research Letters; data

    Fourth draft, editing and references

    Third draft, with proper tests for internal consistency

    Second draft, with more validity tests and Scopus v Web of Science explained.

    First draft of comment on Cook et al. paper in ERL 2013.
    18

    View comments


  10. There have been recent claims that the market value of fossil fuel companies is grossly overstated and about to collapse (here, here).  One report went as far as concluding that this would cause a major economic crisis (here). Is there a carbon bubble about to blow? Bubbles are being blown, for sure, but rather by the researchers making up fantastical claims (here). See also this curious we-know-he's-wrong-but-we-don't-want-to-offend-Lord-Brentford piece in the Economist.

    Let us consider the causal chain step by step.
    (1)    Climate policy is about to end the use of fossil fuels, making fossil fuel reserves worthless.
    (2)    Fossil fuel reserves are a major determinant of the value of fossil fuel companies.
    (3)    The stock market value of fossil fuel companies is a major determinant of the business cycle.

    The first hypothesis is readily dismantled. Europe, the self-proclaimed leader in climate policy, has seen a collapse of the price of carbon dioxide emission permits. Attempts to reform the EU Emissions Trading System have been blocked by politicians who think that cheap energy is more important right now. Japan is likely to abandon its emissions target after this summer’s elections. The Obama administration has ceased to try and find bipartisan support for climate legislation. China is the only ray of hope to those who wish for a stringent climate policy. There are persistent signs that China will regulate emissions before the decade is over, but no sign that regulations will be particularly stringent.

    None of this particularly relevant, however. The claim is that there is a carbon bubble. That is, the “market” puts an unjustifiably high value on fossil fuel reserves. Put differently, traders believe that climate policy will continue to lack ambition for the foreseeable future, but politicians are secretly plotting to implement a stringent climate policy soon.

    As soon as the “market” expects that new regulation will seriously devalue an asset, its price drops. Bubbles only arise if the “market” is misinformed. The “market” is by no means infallible when it comes to pricing risk, but an expectation of “not much climate policy any time soon” strikes me entirely realistic.

    Refuting the second hypothesis requires more specific knowledge. The market value of fossil companies is based on a number of factors, chief among them the expected dividends over the next few years. The value of a company’s assets in the long term is heavily discounted, and particularly so in a business as uncertain as energy exploration and exploitation. Fossil fuel reserves and resources are often owned by states or state-owned companies rather than by public companies. The supermajors in oil and gas add value through their expertise in engineering, project management, and finance, which could be redeployed to other activities in energy and other areas.

    In the unlikely case of unexpectedly stringent climate policy, sovereigns would be hit. This is one reason why climate policy will not accelerate much. It is not wise to cause unrest in Iraq, Iran, Russia, Saudi Arabia or Venezuela.

    The third hypothesis ignores basic facts. Fossil fuel companies are among the largest companies in the world, but their total market capitalization is small relative to the total stock market. Even if they were wiped out completely, the world economy would shrug its shoulders and move on. We have witnessed rapid falls in the stock market value of fossil fuel companies – of all companies as the oil price fell, or of particular companies as disaster struck – and we know from those episodes that the economic impact is limited.

    In sum, there is no carbon bubble. If there were a carbon bubble, it would not be about to burst. If it would burst, the economic impact would be minimal.

    6

    View comments

Blog roll
Blog roll
Translate
Translate
Blog Archive
About Me
About Me
Subscribe
Subscribe
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.