Thank you for sending the link to this blog. I am pleased to learn that ONS is now trying to do more about communicating uncertainty and I have the following suggestions below for improving current practice.
Clicking on the link to Labour Market Statistics leads directly to the latest set of Key Points, where there is no indication at all of any uncertainty in the figures presented. For example, the statement "There were 2.21 million unemployed people for January to March 2014, 133,000 fewer than for October to December 2013 and 309,000 fewer than a year earlier." gives the impression that these are facts rather then uncertain estimates. In order to ensure that readers are, from the outset, aware of the uncertainty in these figures it may be better to give a brief indication of this, without disturbing the flow and brevity of the statement, by adding a ± figure in brackets, thus:There were 2.2 (±0.1) million unemployed people for January to March 2014, 130,000 (±90,000) fewer than for October to December 2013 and 310,000 (±110,000) fewer than a year earlier.[Note: I couldn't find the actual confidence interval for the annual change, so I guessed.]I have rounded the estimates to two significant figures, and the confidence intervals to correspond, in order to emphasise that ONS shouldn't publish data to a precision greater than is justified by their accuracy (see comment 4 below). I think that the ± ("give or take") convention is one that most readers would understand and is standard practice for scientific estimates, where knowledge of uncertainty is essential. A brief footnote could guide readers to a fuller description of the treatment of uncertainty, for example: "The ± figures indicate the possible extent to which the associated estimates may deviate from the true value. See [section/paragraph/note #] for information on the uncertainty inherent in these estimates.". Although this approach is only appropriate for estimates whose error distributions are symmetrical, it is applicable when sample sizes are sufficiently large, which should be the case for most official statistics.
Although colour coding according to uncertainty has its merits, the method used by the Welsh Government is potentially misleading. I was aware that the most brightly highlighted numbers were the least reliable only because I read the introductory description of the colour coding (who does that? - I only read it because I was looking for it). A reader going straight to the table might erroneously think that these numbers are the most important. At the very least, every such table should have a footnote showing a brief key to the colour coding. In addition, I think it would be more helpful and more intuitively informative to vary the font of the numbers instead, thus: bold for accurate, normal for reasonably accurate, dark grey for acceptable and light grey for unreliable. These shadings are more in line with the credence that should be put on the figures.[On a technical point, I don't think that the coefficient of variation (CV) used by the Welsh Government is a suitable measure of uncertainty for proportions. Under the scheme used, an estimated proportion of 5% with a standard error of 1.5% would be deemed unreliable but an estimated proportion of 95% with the same standard error would be deemed accurate, despite the fact that they are equivalent proportions because 95% = 100% - 5%. I think that standard error would be a more suitable measure of uncertainty in this context.]
I am not keen on the dancing bar charts of Oli Hawkins and the New York Times. First, the rapid movements between the estimates and their upper and lower confidence limits make it very difficult to relate these limits to their associated estimates. Second, they implicitly give the same credence to the upper and lower limits as to the estimates but, in fact, it is not likely at all that these limits are closer to the truth than the central estimates. One way of presenting the "cloud of uncertainty" is to replace bar charts and line charts by line charts with rectangular markers whose heights are determined by the confidence limits and whose colour is shaded to be darkest in the centre and fading out towards the top and bottom in such a way as to provide an impression of the error distribution. Basic graphics software allows colour gradients of this fashion for bar charts but does not allow the parameter control that would be required to implement this suggestion, although there may be more advanced graphics software that does. One option might be to create a rectangular image with the required colour gradient and use it as a special marker, adjusting its height and width to suit the requirements of the data. This would require a lot of manual adjustments but it may be possible to program these.
After reading, many times, eminent statisticians in respectable journals using the word "precision" to mean "accuracy", I have long since given up expecting statistical publications to use the word "precision" in its proper sense. However, since I used the statement "ONS shouldn't publish data to a precision greater than is justified by their accuracy" in comment 1 above, it behoves me to explain what I mean. "Precision" means the number of significant digits used to present a figure. "Accuracy" means how close a figure is to the true value it is an estimate of. If a figure is accurate, for example, to only two significant digits but is presented with a precision of three or more significant digits, the trailing digits do not provide any meaningful information. To avoid overburdening readers with useless, uninformative data, ONS should drop any such meaningless digits (for decimal digits) or set them to zero (for trailing digits of whole numbers). The only exceptions should be for uniform formatting of tables. In practice, two significant digits are as much as readers usually need or can absorb and are often as much as can be justified from the accuracy of the data (see comment 1, for example).
Hosted by rss.org.uk