Questions and Answers
The Maimon Research distance education program has received, in a short while, considerable interest in how the key concepts of measurement and falsification can be transmitted to colleagues and students Understandably, there is considerable pushback. As interactive education is a key element is escaping from a mindset that denies measurement standards and evaluable claims, this has resulted in a correspondence that looks to set to grow exponentially.
To mitigate the burden of responding to inquiries, as we should, we have established a QUESTION AND ANSWERS section on the Maimon Research website (www.maimonresearch,com) to support ‘generic’ response. Please feel free to send your questions and comments to Langleylapaloma @ gmail.com.
Let us know if this works for you. Click on the question for the answer:
QUESTION 2: Why are interval and ratio scales so essential?
QUESTION 3: Why do measures and claims have to be unidimensional?
QUESTION 4: Are the preferences from the time trade-off technique unidimensional? If not, why not?
QUESTION 7: Why does measurement require axioms?
QUESTION 8: What are the implications of the axioms or representational measurement for HTA?
ANSWER 1:
ANSWER 1: The most difficult message to convey in academic life is that an entire area of instruction has rested on foundations that no longer withstand scrutiny. Yet that is precisely the position we now face with health technology assessment. For more than two decades, graduate programs across universities, not just ours, have taught a form of HTA that assumes its evaluative machinery—utility scores, QALYs, ordinal preference elicitation, and simulation modelling—constitutes a scientific framework. That assumption has been comfortable, convenient, and institutionally reinforced, but it has also been mistaken. The problem is not a matter of interpretation or philosophy. It is a matter of measurement. The field built an edifice of quantitative claims without first establishing whether the numbers it used possessed the properties required to support arithmetic operations. In consequence, HTA has been teaching students how to manipulate quantities that are not actually quantities.
The central failing is the neglect of representational measurement theory. None of the instruments routinely used in HTA, time trade-off utilities, standard gamble values, multiattribute utility scores, meet the axioms required for interval or ratio measurement. The scores are ordinal at best. Multiplying an ordinal preference value by time does not transform it into a quantity, yet this multiplication is the defining operation behind the QALY. The reference case model that is taught as the gold standard uses these non-measures as inputs and generates elaborate numerical outputs. The sophistication of the modelling has concealed a conceptual flaw: the numbers have no lawful arithmetic structure, so the outputs, however precise they appear, cannot sustain scientific claims. A discipline that mistakes numerical representation for measurement trains students to trust models that cannot be falsified, validated, or interpreted within the framework of normal science.
This is not a local problem. It is a systemic international failure that has persisted because successive generations of practitioners were never educated in measurement theory. They inherited tools, not principles. Graduate programs have understandably perpetuated what appeared to be current best practice, and academic incentives rewarded conformity rather than foundational critique. What has changed is not the underlying theory but our willingness to confront it. The last forty years of HTA are now recognisably an anomaly: a period during which the field operated outside the standards that govern all scientific disciplines.
Explaining this to students and to the wider academic community is not an admission of institutional fault but an act of academic responsibility. Correcting course is not optional. If our program claims to train students in scientific evaluation of health technologies, then it must teach them what scientific evaluation actually requires: unidimensional constructs, interval or ratio measurement where arithmetic is intended, and explicit protocols that produce empirically testable value claims. Anything less perpetuates error.
The goal is not to disown our history but to ensure that our instruction satisfies the same standards of rigor we expect from every other scientific field. If we do not address this gap now, our graduates will continue to practice a methodology that the scientific community will eventually judge as untenable. A curriculum grounded in measurement is not a radical shift. It is simply bringing HTA back into the domain of science.
ANSWER 2:
ANSWER 2: Interval and ratio scales are essential because they are the only kinds of numerical representations that allow the arithmetic operations we routinely rely on to make scientific claims. Science is built on the idea that numbers correspond to differences or ratios in the world that behave lawfully. If the numbers do not have those properties, then the arithmetic we perform on them is meaningless, however familiar or comfortable it feels.
An interval scale ensures that equal numerical differences represent equal differences in the attribute being measured across the entire range. Temperature in Celsius is the standard example: the gap between 20° and 30° is the same magnitude as the gap between 80° and 90°. If a scale does not have constant intervals, subtraction and addition lose their interpretability. You cannot quantify change if the size of a “unit” varies depending on where you are on the scale. Most psychological and health-related scores suffer from exactly this problem: the person who moves from a score of 20 to 40 may not have changed by the same amount as someone who moves from 50 to 70. Treating such scores as if they support arithmetic confuses ranking with measurement.
A ratio scale adds a further requirement: a true zero that represents the absence of the attribute being measured. This gives the scale meaningful ratios. If you say one patient experiences twice as many seizures as another, or that one drug reduces hospital days by 50%, you are implicitly assuming a ratio scale. Multiplication and division only make sense when zero is absolute and intervals are constant. Most physical measurements—length, mass, time—are ratio scales because the mathematics requires it. In HTA, any claim that involves relative change, proportional effects, cost per unit of effect, or efficiency demands ratios. If the dependent variable is not on a ratio scale, the entire analytic structure collapses.
Without interval or ratio properties, numerical scores are simply ordered labels. They can tell you that one state is better than another, but they cannot tell you how much better or whether differences are comparable across the scale. Yet HTA routinely treats ordinal utilities as if they were interval, and then performs ratio operations on them. That is why the QALY is impossible in principle: it requires multiplying time by a value that lacks interval structure and lacks a zero that represents the absence of health. The result is not a measure but a number masquerading as one.
Fundamentally, interval and ratio scales matter because they are the only foundations that allow falsifiable, testable, and comparable claims. Without them, HTA cannot produce evaluable evidence. It can only produce numerical storytelling.
ANSWER 3:
ANSWER 3: In measurement theory and psychometrics, the concept of unidimensionality is fundamental because it ensures that a measure or claim reflects a single underlying construct. A unidimensional measure assumes that all items within it assess the same latent trait, such as intelligence, anxiety, or job satisfaction. This assumption is crucial because it allows researchers and practitioners to make meaningful and interpretable claims about the construct being measured. If a measure is multidimensional, meaning it captures more than one underlying factor, then any resulting score becomes ambiguous, as it cannot be confidently attributed to a single construct. Unidimensionality, therefore, provides the conceptual and statistical clarity needed to interpret scores accurately.
Unidimensionality also supports the validity and reliability of measurement. Validity refers to the degree to which evidence and theory support the interpretation of scores for their intended purpose. If a test claims to measure one construct but in fact reflects several, its validity is compromised because the score does not represent what it purports to measure. Similarly, reliability, or the consistency of measurement, depends on the assumption that responses are driven by a single source of variation rather than multiple competing influences. When items are unidimensional, their correlations with each other can be explained primarily by the shared construct, allowing for coherent internal consistency estimates, such as Cronbach’s alpha or omega coefficients.
From a practical perspective, unidimensionality simplifies interpretation and comparison. For example, when educators or clinicians use test scores to make decisions, they need to know that a higher score reflects more of the same trait, not a mix of unrelated abilities or attitudes. In statistical modeling, such as factor analysis or item response theory (IRT), unidimensionality allows the estimation of item parameters and person scores on a single latent continuum, facilitating precise and interpretable measurement. Multidimensional data can still be analyzed, but only when explicitly modeled as such; otherwise, ignoring dimensionality can lead to biased estimates and misleading conclusions.
Ultimately, requiring measures and claims to be unidimensional aligns measurement with the logical principle of construct purity. Each claim about a variable presupposes that it represents one coherent concept. If a measure confounds multiple constructs, any claim derived from it becomes theoretically unstable. Unidimensionality thus preserves interpretive integrity, ensures statistical coherence, and supports the meaningful accumulation of knowledge within scientific and applied domains
ANSWER 4:
ANSWER 4: The preferences derived from the time trade-off (TTO) technique are not unidimensional when they are based on descriptions of health states, as they always are in health technology assessment (HTA). To claim that they are would be a fundamental misunderstanding of both the technique and the nature of health valuation itself. The TTO may produce a single index value, but that number does not reflect a single dimension of preference; it is a collapsed summary of a complex, multidimensional evaluative process.
In HTA applications, TTO is typically used to generate utility weights for multi-attribute instruments such as the EQ-5D or SF-6D. These instruments describe health states across several dimensions, mobility, self-care, pain, anxiety, and so forth, each of which captures a distinct component of health-related quality of life. When respondents are asked to trade time in these described states for time in full health, they are not judging one attribute in isolation. They are making an overall judgment that integrates perceptions of physical functioning, emotional well-being, pain, and other aspects of health. Therefore, even though the TTO outcome is a single score, it is the product of a multidimensional cognitive process. The apparent unidimensionality of the final number is an artifact of the method’s requirement to collapse rich, multidimensional experiences into a single utility scale for use in cost-utility analysis.
Moreover, TTO responses depend not only on the characteristics of the described health state but also on the respondent’s broader attitudes toward life duration, risk, and death. People vary in how they perceive time and how they weigh length of life against quality of life. Some may refuse to trade any time at all, regardless of health state severity, while others may trade extensively. These differences arise from psychological, cultural, and ethical perspectives, not from any single “health” dimension. Consequently, the TTO score reflects intertwined preferences about living conditions, survival, and personal identity, not a single underlying utility dimension.
In HTA practice, therefore, the TTO cannot be treated as a measure of unidimensional preference. It is an inherently multidimensional construct forced into a single index value for analytic convenience. The aggregation may serve economic modeling needs, but conceptually and empirically, it fails to represent health preference as unidimensional. To insist otherwise is to ignore both the multidimensional nature of health and the complex trade-offs that respondents actually make.
ANSWER 5:
ANSWER 5: The time trade-off (TTO) technique is widely used in health economics to measure the value individuals assign to different health states. In a TTO exercise, respondents are asked to choose between living a certain length of time in less than full health or a shorter period in full health. The point of indifference between these options provides a numerical index for the health state, typically scaled between 0 (death) and 1 (full health). These numerical values are then used to calculate quality-adjusted life years (QALYs). Despite their apparent precision, TTO values cannot legitimately support arithmetic operations because the preferences they represent are neither interval nor ratio in nature, and because the TTO scale itself is not unidimensional.
The assumption behind using numbers for TTO results is that they represent points along a single, continuous line of utility where equal numerical differences reflect equal changes in preference. However, this assumption is flawed. Health-related quality of life is inherently multidimensional, involving physical, psychological, and social components that interact in complex ways. A TTO response is not determined by one simple dimension of “utility” but by a combination of factors such as perceptions of suffering, adaptation, longevity, and personal attitudes toward death and disability. Different respondents may trade time for health differently depending on which dimension of well-being they prioritize. Consequently, the TTO score does not correspond to a consistent underlying construct that can be measured along a single scale.
Because the TTO scale is not unidimensional, its numerical values lack the properties required of interval or ratio data. The differences between scores are not necessarily equivalent in meaning across the scale, and there is no true zero point. The “zero” associated with death does not represent the complete absence of value, as many people rate some health states as worse than death, generating negative scores. Without equal intervals or a true zero, TTO values cannot legitimately be added, subtracted, or multiplied. While in practice analysts treat them as if they were interval-scaled to enable QALY calculations, this is a pragmatic convention rather than a theoretically sound practice. The multidimensional and non-interval nature of TTO preferences means they cannot truly support arithmetic operations.
ANSWER 6:
ANSWER 6: The conceptual foundation of the time trade-off (TTO) method makes a unidimensional score impossible. The TTO is based on health state descriptions that are inherently multisymptom and multidimensional, such as those used in the EQ-5D instrument. Each health state is defined by several qualitatively distinct attributes, including mobility, self-care, usual activities, pain or discomfort, and anxiety or depression. When respondents are asked to perform a TTO task, they are not evaluating a single underlying construct but instead making an overall judgment about a combination of different experiences and impairments. This means that even at the conceptual level, the idea that these judgments could be represented along one continuous, unidimensional scale of “utility” cannot be supported.
In practice, the TTO assumes that individuals’ preferences for different health states can be ordered and expressed numerically between 0 and 1, as if there were a single latent variable representing overall health-related quality of life. However, this assumption fails because health is not a single-dimension construct. It involves multiple domains that are not commensurable and may trade off against each other in non-linear, context-dependent ways. For example, the value a respondent assigns to a state of severe pain but full mobility cannot be assumed to relate in any consistent or additive way to the value they assign to a state of restricted mobility but no pain. These domains interact differently across individuals and situations, making it conceptually impossible to collapse them onto a single, interval-scaled dimension.
Statistical techniques such as factor analysis cannot resolve this issue. Factor analysis presupposes that the observed variables reflect one or more continuous latent traits that are linearly related to the data. In the case of TTO, however, the data are ordinal and preference-based, not continuous measures of a latent psychological construct. The overall TTO valuation represents a holistic judgment that blends multiple factors, rather than responses that can be decomposed into linear components. In addition, people’s preferences are often non-compensatory: a very poor outcome in one domain, such as extreme pain, may dominate all others and cannot be balanced by improvements elsewhere.
Consequently, even conceptually, the TTO cannot produce a truly unidimensional scale. The resulting EQ-5D-3l algorithm where TTO weights are applied means the claimed utility scores are best understood as pragmatic ordinal summaries of complex, multidimensional value judgments, rather than genuine measures of a single, continuous underlying construct.
ANSWER 7:
ANSWER 7: Measurement requires axioms because it depends on an underlying logical and mathematical framework that allows quantities to be meaningfully compared, combined, and interpreted. In essence, measurement is not simply the act of assigning numbers to objects or phenomena, it is the process of establishing relationships between abstract numerical systems and empirical reality. To make such relationships consistent and objective, one must begin with a set of fundamental assumptions, or axioms, that define what measurement means and how it operates.
In mathematics and science, axioms provide the foundation upon which reasoning is built. They are statements accepted without proof, chosen because they appear self-evident or because they generate a coherent and useful system. In measurement theory, axioms formalize concepts like equality, order, and additivity. For instance, when we measure length, we assume that any two segments can be compared; that one can be longer, shorter, or equal in length to another. This assumption is not something that can be derived from experience alone; rather, it defines the very possibility of comparing quantities. Similarly, we assume that lengths can be added; that the total length of two segments placed end to end is the sum of their individual lengths. Without such axioms, even simple operations like addition or comparison would lose their meaning.
Axioms are also necessary to connect measurement with numbers. For a numerical representation of measurement to make sense, we must assume that numerical relationships mirror empirical ones. For example, if one object is twice as heavy as another, we expect the number assigned to its weight to be twice as large. This correspondence is established through axioms that ensure that empirical relations, such as “twice as long” or “equal to”, are preserved in the numerical structure. Representational measurement theory, developed by Krantz, Luce, Suppes, and Tversky, uses axioms to rigorously define when such mappings are possible and when numerical scales are meaningful.
Finally, axioms are what make measurement objective and communicable. By grounding measurement in a shared set of assumptions, scientists and mathematicians ensure that results are not arbitrary but consistent across observers and contexts. Without axioms, measurement would collapse into subjective estimation, lacking coherence or reproducibility. In this sense, axioms serve as the invisible scaffolding that allows measurement to bridge the gap between abstract mathematics and empirical reality.
ANSWER 8:
ANSWER 8: The axioms of representational measurement show that only measures conforming to interval or ratio scale properties can legitimately support assessments of therapy impact in health technology assessment (HTA). These axioms, which define the structural relationships between empirical observations and their numerical representation, clarify that most health outcomes currently used in HTA do not meet the necessary conditions for meaningful quantification. Only two forms of measurement—linear ratio and Rasch-derived interval measures, satisfy the axiomatic requirements that allow real comparisons of therapeutic effects.
A linear ratio measure possesses a true zero and equal units, allowing statements of proportionality and meaningful numerical operations. Examples include physiological or biochemical quantities such as blood pressure, weight, or enzyme concentration, where the relationships between measured values correspond directly to empirical differences. These measures fully meet the axioms of ratio measurement: they preserve order, equality, and proportionality across the entire range of the attribute.
In contrast, many constructs central to HTA, such as health status or quality of life, are latent attributes that cannot be directly observed or measured on a ratio scale. The Rasch model addresses this by transforming ordinal responses, such as those obtained from questionnaires, into interval-level measures expressed in logits. This transformation is grounded in a probabilistic framework that satisfies the axioms of conjoint additivity and ensures that equal differences on the resulting scale correspond to equal differences in the underlying latent trait. While not ratio in nature, Rasch interval measures can be transformed to ratio scales when a meaningful reference point or origin is defined, thereby permitting proper assessment of therapy impact.
The implications of this are decisive for HTA methodology. Most conventional instruments used to measure health outcomes, such as multi-attribute utility indices, generate ordinal data that violate the axioms of measurement theory. Ordinal numbers merely rank health states but do not represent equal intervals or proportional differences. Any arithmetic or comparison of therapy effects using such data lacks mathematical validity. Only when measures meet the axioms for interval or ratio scaling can observed changes be interpreted as real differences in patient outcomes.
Thus, representational measurement theory makes clear that legitimate evaluation of therapeutic impact in HTA requires adherence to rigorous measurement foundations; specifically, the use of linear ratio measures for observable attributes and Rasch interval measures for latent constructs that can be transformed to ratio form.
