Message Image  

RPICPI User Group

last person joined: 24 days ago 

To foster cooperation and the exchange of information between ONS statisticians concerned with the RPI and CPI, the Consumer Prices Advisory Committee (CPAC) and those using the data. Please see 'Aims of the RPI/CPI User Group' in the library for further details.

1.  Outlier treatment and the Median of Price relatives

I have recently posted a copy of chapter 1 of the ILO CPI manual in the library.

Paragraph 1.136 raises the issue of outlier treatment for the Jevons formula to reduce downward bias.  Others have advocated outlier treatment for the Carli formula to reduce upward bias.  In reality almost any formula used should be subject to outlier treatment

I suggest excluding all monthly price relatives greater than 1.5 or less than 0.67.  This would remove the effects of short term half price offers which are not really part of the price trend but simply increase the variance (as well as creating bias).

One formula where this would not be necessary would be the median of price relatives.  This formula is very robust to oddities in the data yet it has received virtually no attention in the literature or from ONS, although ONS uses the median in the ASHE survey of earnings.

I would like to see outlier treatment and the Median of Price Relatives form part of ONS' research programme before any more changes are proposed to indices and their formulae.

Of course we can forget about satisfying axioms under both of these approaches.  They are not neat enough mathematically for that, but that is a reasonable price to pay for more accurate indices.


2.  RE:Outlier treatment and the Median of Price relatives

Gareth I have just added a document (summarising my attempts to look at the outlier issue) to the library which may be of interest for this disussion.

I used the data on price relatives (supplied by the ONS during the recent consultation) to calculate various indices (Carli, Jevons, CSWD) and then compared them to outlier robust measures of central tendency. This is obviously limited (few products and short-time period) but may still be interesting.

Regarding your point on the median. Statistically, I can see why the median is of interest but I think practically this may have problems. If menu costs (see are important, firms will change prices infrequently and in (larger) discrete steps. A different view of the psychology of pricing suggest retailers give a small number of large discounts (to attract customers). These perspectives suggest the median may miss some important aspect of the prices that people face.


3.  RE:Outlier treatment and the Median of Price relatives


This is a very useful piece of work.

I can see the problems of using the median if retailers are using some form of tactical price increase approach.  For example if, on a rotating basis, they increase the prices of one quarter of all products by 10% each month, then after 4 months everything has increased by 10%, but the median will show no change.

This would also affect the mid-hinge and trimean methods to a lesser degree, which leaves the winsorisation and trimming approaches.

Personally, I favour the trimming approach, but only if it is based on pre-determined bounds for the price relatives rather than quantiles of the distribution of price relatives.

For example, if there are 20 price relatives, with 3 outliers at one end of the distribution and none at the other, use of the 5th and 95th quantiles will remove one of the 3 outliers at one end and one non-outlier at the other end.

It would be interesting to see the results of this using the upper and lower bounds I suggested.  Also, I think that after trimming, one should revert to the original formula (e.g. Geometric mean for the Jevons formula).



4.  RE:Outlier treatment and the Median of Price relatives

I am glad that Gareth Jones wants the RPI CPI Group to study this issue. If I am not mistaken, the ONS already uses the Tukey method to identify outliers. This is well described in the UNECE-ONS manual A Practical Guide to Calculating Consumer Price Indices. Unfortunately one of the drawbacks of the Tukey method is that it requires quite large price samples so I am not sure what, if anything, the ONS does in the case of smaller samples.

The Practical Guide also discusses one variant of the quartile method for outlier detection, employing the Hidiroglou-Berthelot transformation, named for two methodologists who work for Statistics Canada. It is used, unless things have changed, for outlier detection in the Danish CPI. At my insistence, it was also used to calculate outliers for the traveller accommodation CPI when it was redesigned based on a random sample. While the prouction system spit out a lot of outlier hotel price relatives these were never properly analyzed by any methodologist. I complained about this sad state of affairs to the director of Consumer Prices Division, Richard Evans, but as was his wont, he did nothing. 

Since then, the thinking among methodologists at StatCan has changed, and when they were asked to recommend an outlier detection procedure for the CPI, they recommended the use of the quartile method with a log transformation of the data. As the Hidirioglou-Berthelot transformation approximates a log transformation, the difference is not huge. 

This was supposed to be introduced in the new CPI production system but as far as I know, it never happened. Mr. Evans claimed that it didn't look like a go because of small samples. When I asked him if he thought our price samples were any smaller than those in the Danish CPI, he remained mute, as is also his wont.

A really sophisticated version of the quartile method could use a log transformation of the price data for one series, a different transformation or another, and I believe that StatCan's business survey methodologists had the prototype of a system that would do this when I worked there. It is likely much better developed now.

As you can see, my bias is towards the quartile method, which is about all I have ever worked with, in one variant or another, for about the last 30 years. I don't know as much about the Tukey method, which may be quite appropriate for larger samples, or other methods of interest.

Anyway, it is a fascinating subject, and I hope that Gareth is successful in getting the Group to look more into it. Unfortunately, it is really more a question for the methodologists in the group, rather than economists like me.

5.  RE:Outlier treatment and the Median of Price relatives

Andrew's email prompted me to contact the ONS to clarify the situation. The ONS does indeed use outlier techniques (the Tukey algorithm) to remove extreme values.  They would have applied these techniques to the data supplied as part of the consultation. So my analysis will have been a double outlier correction / analysis. Hence the calculated figures could well be throwing away too much information, though I personally still find it interesting to see how much the calculated inflation rates are driven by the tails of the numbers that survive ONS trimming.

Gareth - I am not a fan of arbitrary rules (such as cutting price relatives below 0.67 and above 1.5) as I think you risk dropping too many true data points. Setting the rules can also imply strong beliefs on the data (in this case no sales or post-sales price reversion in the index).  For that reason I tend to favour statistical adjustments based on looking at the tails of the distribution.

Nevertheless I did test the approach Gareth suggested (dropping price relatives below 0.67 or above 1.5) . Generally it seemed to produce lower estimates of inflation (than the Jevons as currently published). This was particularly true for items with log-normal distributions of price relatives (and a long tail of highly positive price relatives).


6.  RE:Outlier treatment and the Median of Price relatives

Jonathan, Gareth's 0.67 and 1.5 limits may look arbitrary to you, but they are a lot more sensible than the kind of rules that the prices sector of Statistics Canada has traditionally worked with. At least Gareth recognizes that if the upper bound is (1+g) the lower bound has to be 1/(1+g), so that if a price change is not an outlier when a price is substantially discounted neither will it be one when it returns to normal. At StatCan they have traditionally used a lower bound of (1-g) when the upper bound is (1+g), e.g. 0.67 and 1.33. If a price goes from $3 to $2 it is barely in bounds but when it returns to normal this is a 50% increase, and in excess of the 33.3% upper bound. 

If your estimates were based on data already outlier-adjusted by the ONS, perhaps they can give you the original data before outlier adjustment. It would be a shame if they couldn't.  

I have attached a paper by Saad Rais describing the outlier method proposed for the redesign of the Canadian CPI based on the quartile method with log transformation of the price relatives. There have been subsequent papers on the same subject written by StatCan methodologists, but they only offer refinements of the method described here.(One interesting one is to test for the best transformation to the price relatives before applying the quartile method.)  Note that Mr. Rais has some critical remarks to make about the Tukey method currently used by ONS.

Thank you to you and Gareth for getting a discussion started on this very important issue.


7.  RE:Outlier treatment and the Median of Price relatives

I think there is still a degree of doubt about exactly what ONS does with outliers.  Last year I attended a user group meeting where I asked a question about this.  The reply I got from ONS was that outlier identification (the nature of which was not specified) was conducted in order to credibility check collected data.  However, if the value was confirmed as correct, this "confirmed outlier" would still be used in whatever formula was being used for the index.  This was not what I would call treatment of outliers.  I think ONS needs to specify in the technical manual, exactly what it does do.

Your work on this subject already shows that the treatment of outliers is as important, if not more important, than the choice of formula, and the two issues cannot really be separated.  This underlines the need for a comprehsive study of outlier treatment across all sectors and all methodologies.

Outlier treatment is inherently subjective.  If one believes that an outlier is part of a representative sample and just increases the variance of estimates, there is no need for outlier treatment at all.  If however, one takes the view that the sample is not representative (probably true) then some degree of outlier treatment is necessary.  Too much treatment leads to bias one way, too little to bias the other way.


8.  RE:Outlier treatment and the Median of Price relatives

I actually did one further check of the data -- a Tukey outlier correction method to the data supplied by the ONS (to be clear this is a different method to that described in the CPI manual). I am uncertain about the merits of this (an outlier correction of an outlier correction) but the resulting data that results had similar arithmetic and geometric means.

This tells me that the formula effect is, in practice, about the tails and how they are treated. It doesn't though say whether the tails are 'reasonable'. I am a bit wary of simply throwing extreme values away (they may reflect true costs which people face) but at the same time I am not sure that such extreme movements are truly like-for-like price variation.  How much is sampling variation (the price collector picking a slightly different product can produce a lot of variance in itself)? How much is down to sales (and in which case are items of similar quality, etc.).

I appreciate these are very tricky judgments, but I would hope some discussion of these issues is touched upon in the ONS' onoing work surrounding price collection methods.