Looking back over the past 40 years, to include the 1980s when the multiattribute instruments were being developed, and consequently, with the smorgasbord of disease specific PROs with many claiming to capture quality of life in that disease state, there is little if any recognition of any understanding of fundamental measurement and the limitations imposed by the respective axioms. The result was a foregone conclusion, which was noted at the time by measurement theorists, that the various instruments would be restricted to ordinal scores as there was no recognition of the need to develop ratio measures in the first place. While some may claim the preference scores are actually interval scores, this is because the users put the ranked ordinal scores on a number line with interval properties. Any other number line with varying distance would equally well suffice.
Driven in large part by the need to construct and accommodate the ubiquitous QALY, the focus was on preference scores, as proportions in a range from zero to unity, that could be applied to time spent in the stages of a disease to create the adjusted quality of life equivalent. What was completely overlooked was the need for a single attribute, bounded ratio scale to define the preference score. It was only with this measure that a QALY could be constructed with associated value claims. The notion of a single attribute to capture quality of life was subsumed in the construction of multiattribute preference scores where clinician determined symptoms and response levels were proposed (and tested in patient populations) for each of the competing instruments. The various instruments were not compatible, indeed even within the EQ-5D variants, the EQ-5D-3L and EQ-5D-5L produce different health states and different scores for ostensibly the same health state but defined with more refined response levels. As multiattribute scores they, of course, lack dimensional homogeneity and construct validity; apart from the fact that if you wish to combine attributes each must have ratio measurement properties. This is critical as an interval score cannot create QALYs as we have no idea of a true zero. The QALY is, therefore, an impossible mathematical construct (hence I-QALY).
The crowning disaster for these various multiattribute instruments was the fact that the algorithms to create the scores with community determined weights for health states, all failed the simplest test for a ratio measure: they each created negative scores or, as was euphemistically stated “states worse than death”. This resulted from the algorithms creating preference scores as decrements from unity; inevitably in trying to fit an algorithm to the data they overshot. This meant, quite conclusively, that there was no way a ratio claim could be made. Unfortunately, in an attempt to rescue the approximate imaginary information belief system, it has been claimed by the Institute for Clinical and Economic Review (ICER) that health economists (or at least some) have confidence that the preference scores have ratio properties; a mystical transformation to save face for the academic groups as consultants for the ICER model evidence reports. ICER’s business case rests on the acceptance of its evidence reports and imaginary value claims.
Given the continued belief in the approximate information meme, that the ordinal preference scores are ratio measures in disguise it is useful to briefly recap on the standards of fundamental measurement. This should put to rest any ‘confidence’ in the mystical ordinal score with ratio properties.
Briefly, scales or levels of evidence used in statistical analyses are classified as a nominal, ordinal, interval, or ratio. Each scale has one or more of the following properties: (i) identity where each value has a unique meaning (nominal scale); (ii) magnitude where values on the scale have an ordered relationship with each other but the distance between each is unknown (ordinal scale); (iii) invariance of comparison where scale units are equal in an ordered relationship with an arbitrary zero (interval scale) and (iv) a true zero (or a universal constant) where no value on the scale can take negative scores (ratio scale). Nominal and ordinal scales only support nonparametric statistics. Interval scales can support addition and subtraction while ratio scales support the additional operations of multiplication and division as they have a true zero. This zero-point characteristic means it is meaningful to say the one object is twice as long as another. Given these limitations, the only acceptable empirically evaluable value claims are those designed for single attributes with interval or ratio properties.
Langley PC, McKenna SP. Measurement, modeling and QALYs [version 1; peer reviewed] F1000Research. 2020; 9:1048 https://doi.org/10.12688/f1000research.25039.1
Langley P. The Great I-QALY Disaster. InovPharm. 2020; 11(3): No 7 https://pubs.lib.umn.edu/index.php/innovations/article/view/3359/2517
Langley P. To Dream the Impossible Dream: The Commitment by the Institute for Clinical and Economic Review to Rewrite the Axioms of Fundamental Measurement for Hemophilia A and Bladder Cancer Value Claims. InovPharm. 2020;11(4):No. 22 https://pubs.lib.umn.edu/index.php/innovations/article/view/3585/2642
Langley P, McKenna S. Fundamental Measurement and Quality Adjusted Life Years. Value Health. 2021;24(3):461[letter] https://www.valueinhealthjournal.com/article/S1098-3015(20)34409-0/fulltext
McKenna S, Heaney A, Langley P. Fundamental Outcome Measurement: Selecting Patient Reported Outcome Instruments and Interpreting the Data they Produce. InovPharm. 2021; 12(2): No. 17 https://pubs.lib.umn.edu/index.php/innovations/article/view/3911/2764