However, Steve McIntyre and Brandon Shollenberger protested that the second rating period was dominated by tie-break ratings. This is a particular part of the data sample. Results should be different. Indeed, if we plot the chi-squared statistic against time, testing first/second/third ratings on a particular day against all first/second/third ratings, nothing untoward appears in the later period of active rating.
First ratings:
Second ratings:
Third (tie-break) ratings:
That said, the tie-break ratings are not without blemish (apart from the fact that 7 out of 46 days are above the 99%ile). Comparing the first ratings that were not challenged against those that were, I find that their distributions differ (chi2=80, p<0.01). Ditto for the second ratings (chi2=29, p<0.01). This is as it should be. However, comparing the unchallenged ratings to the tie-breaks, a large difference appears (chi2=393, p<0.01). That is, the tie-breaks (in the second period) moved away from the original ratings (in the first period). Indeed, in 74 cases, the third rating lies outside the bracket of the first and second rating. And some abstracts were re-rated even though the first two ratings agreed.
Particularly, the tie-break rating counted 44% and 25% fewer rejections of the hypothesis of anthropogenic warming compared to the first and second ratings, respectively; but 8% fewer and 16% more endorsements than in the first and second ratings, respectively.
Recall that the tie-break ratings took place after the raters had had the opportunity to look at their results.
Add a comment