True or False

It is apparent from responses to the LLM interrogation that contemporary health technology
assessment exhibits not merely an absence of awareness of representational measurement theory,
but a deeper failure to understand the elementary relationship between mathematics and
measurement. Across agencies, journals, and international organizations, statements that express
foundational axioms of measurement are weakly endorsed or rejected outright, while statements
that describe mathematical impossibilities are routinely accepted. This pattern cannot be
explained as methodological disagreement. It reflects a systematic loss of scale literacy.

To address this failure, each of the twenty-four statements used in the interrogation is presented
below with a clear determination of whether it is TRUE or FALSE, followed by an explanation
grounded in the axioms of representational measurement theory and the rules governing
permissible arithmetic. The purpose is not polemic, but clarification. These explanations make
explicit the constraints that must be satisfied before numbers can be treated as measures and
before arithmetic can be meaningfully applied.

Statement: “Interval measures lack a true zero.”
Classification: TRUE

An interval scale, by definition, does not possess a true zero. This is not a matter of convention or preference; it follows directly from the axioms of representational measurement theory. A true zero is not simply the lowest observed value on a scale. It is a point that represents the complete absence of the attribute being measured. For a scale to have a true zero, zero must be meaningful in an absolute sense, such that ratios involving zero and nonzero values have interpretable meaning. Interval scales do not satisfy this requirement.

The defining property of an interval scale is that equal numerical differences represent equal differences in the underlying attribute. This permits addition and subtraction, but it does not permit multiplication or division in a meaningful way. The location of zero on an interval scale is arbitrary: it can be shifted without altering the empirical meaning of differences. Temperature measured in Celsius provides the canonical example. Zero degrees Celsius does not represent the absence of temperature; it represents an arbitrarily chosen point on the temperature continuum. The same physical state can be expressed as 0°C, 32°F, or 273.15 K, depending on the scale origin. Because the zero point can be relocated by a linear transformation without changing the meaning of the measure, it cannot represent absence of the attribute.

This arbitrariness of the zero point is precisely what distinguishes interval scales from ratio scales. In a ratio scale, zero is fixed by the empirical structure of the attribute. Length, mass, and time have true zeros: zero length means no length, zero mass means no mass, zero time means no duration. Because zero represents absence, ratios are meaningful. An object that is two meters long is twice as long as an object that is one meter long. Such statements are meaningless on an interval scale, where the zero does not anchor the scale to absence.

The absence of a true zero has direct implications for permissible arithmetic. On an interval scale, statements such as “twice as much” or “half as much” are invalid, because multiplication and division depend on a meaningful zero. Only differences are interpretable. Saying that one temperature is 10 degrees higher than another is meaningful; saying it is twice as hot is not. This limitation is not a technical inconvenience; it is a categorical boundary that protects arithmetic from misrepresentation.

In the context of health technology assessment, this distinction is routinely ignored. Preference-based utility scores are often treated as if they were ratio measures, even when they at best satisfy interval properties, and often not even that. By treating interval-scale quantities as if they possessed true zeros, HTA practice enables illegitimate arithmetic operations, including multiplication by time and division by cost. Recognizing that interval measures lack a true zero is therefore not a pedantic observation. It is a foundational constraint. Once it is acknowledged, large parts of standard HTA arithmetic become impossible, not controversial.

Statement: Measures must be unidimensional

Statement: “Measures must be unidimensional.”
Classification: TRUE

Unidimensionality is a necessary condition for measurement. A measure can represent only one attribute at a time. This requirement is not a stylistic preference or a simplifying assumption; it follows directly from the logic of representational measurement theory. For numbers to represent an empirical attribute in a meaningful way, there must be a single, well-defined dimension of variation that those numbers correspond to. If more than one attribute is involved, the numerical representation becomes ambiguous, and arithmetic loses interpretability.

The core purpose of measurement is to map variations in a single empirical attribute onto variations in numbers while preserving relevant relations. If multiple attributes are conflated, there is no longer a unique empirical structure being represented. A numerical difference could reflect a change in one attribute, another, or some mixture of both. In such cases, the numbers do not correspond to anything determinate in the real world. They become labels attached to heterogeneous bundles rather than measures of a single property.

This is why unidimensionality is foundational in all mature measurement sciences. Length, mass, time, temperature, and electric charge are each defined along a single dimension. When phenomena are multidimensional, science does not respond by inventing a single composite measure and pretending it is one thing. Instead, it measures each dimension separately. Velocity is not measured directly; it is derived from two unidimensional measures, distance and time, each with its own scale properties. The derivation is lawful precisely because the underlying measures are unidimensional and their scale types are known.

In psychometrics and the measurement of latent traits, unidimensionality is equally essential. Models such as the Rasch model exist precisely to test and enforce unidimensionality. Rasch measurement does not assume that a set of items measures a single trait; it evaluates whether the data conform to that requirement. Only if responses can be explained by variation along one latent dimension can a scale be constructed. Without unidimensionality, there is no latent trait to measure, only a collection of loosely related indicators.

The consequences of ignoring unidimensionality are severe. Composite indices that combine multiple attributes into a single score cannot be interpreted as measures because changes in the score do not correspond to changes in any single attribute. Arithmetic performed on such composites is uninterpretable, regardless of how sophisticated the weighting scheme appears. The problem is not that the weights are debatable; it is that no weighting can rescue the loss of dimensional coherence.

In health technology assessment, this requirement is routinely violated. Constructs such as “health-related quality of life” are treated as if they were single attributes, even though they explicitly combine distinct dimensions such as mobility, pain, mood, and self-care. When such multidimensional constructs are collapsed into a single index, the result is not a measure of anything. It is a summary score. Treating that score as if it were a measure enables arithmetic that has no empirical meaning.

To insist that measures must be unidimensional is therefore not to deny the complexity of health or human experience. It is to insist that complexity be respected rather than obscured. Measurement requires discipline. Without unidimensionality, numbers do not measure; they merely summarize.

Multiplication requires a ratio measure

Statement: “Multiplication requires a ratio measure.”
Classification: TRUE

Multiplication is only meaningful when applied to quantities that possess ratio-scale properties. This is not a convention of statistics or an assumption of modeling practice; it is a logical requirement of arithmetic grounded in representational measurement theory. A ratio scale is defined by two essential properties: equal intervals and a true zero. Without both, multiplication and division cannot preserve empirical meaning.

The role of a true zero is decisive. A true zero represents the complete absence of the attribute being measured. It anchors the scale in the empirical world, making ratios interpretable. When a quantity has a true zero, statements such as “twice as much,” “half as much,” or “three times larger” have meaning because zero fixes the origin of the scale in a non-arbitrary way. Length, mass, time, and count all satisfy this condition. Zero length means no length, zero mass means no mass, and zero time means no duration. Because of this, multiplication on these quantities corresponds to real-world relations.

Interval scales do not satisfy this requirement. Although they permit addition and subtraction, their zero point is arbitrary and can be shifted without changing the meaning of differences. Because the origin is not fixed by the attribute itself, ratios are meaningless. Saying that 20 degrees is twice as hot as 10 degrees has no physical interpretation if temperature is measured on an interval scale such as Celsius or Fahrenheit. The numerical ratio does not correspond to any ratio in the underlying attribute. Multiplication in this context produces a number, but not a meaningful quantity.

This distinction is categorical, not gradual. A scale either has a true zero or it does not. If it does not, multiplication is forbidden. There is no mathematical workaround, no weighting scheme, and no modeling sophistication that can overcome this constraint. Performing multiplication on a non-ratio scale does not produce an approximate result; it produces nonsense. The arithmetic operation ceases to represent anything empirical.

In measurement science, derived quantities are constructed through multiplication only when the scale types of the components justify it. Velocity is distance divided by time, both ratio measures. Force is mass multiplied by acceleration, again ratio measures. These constructions are lawful because the operands satisfy the axioms required for multiplication. If one operand lacked ratio properties, the derived quantity would be uninterpretable.

In health technology assessment, this rule is systematically violated. Preference-based utility scores, which at best can claim interval properties and often not even that, are multiplied by time to generate QALYs. This multiplication is treated as if it were analogous to multiplying meters by meters or seconds by seconds. It is not. Because utilities lack a true zero, the product has no interpretable meaning. Time remains a ratio measure; utility does not. Multiplying a ratio measure by a non-ratio measure does not upgrade the latter. It contaminates the former.

The insistence that multiplication requires a ratio measure is therefore not pedantic. It is protective. It marks the boundary between arithmetic that represents reality and arithmetic that merely produces numbers. Once this boundary is crossed, the results cannot be defended as measures, no matter how widely they are used or how institutionally embedded they have become.

Time trade-off preferences are unidimensional FALSE

Statement: “Time trade-off preferences are unidimensional.”
Classification: FALSE

Time trade-off (TTO) preferences are not unidimensional, and treating them as such is a fundamental error. Unidimensionality requires that responses vary along a single underlying attribute, such that differences in observed values can be attributed solely to differences in that one dimension. TTO data fail this requirement because they simultaneously reflect multiple, conceptually distinct attributes that cannot be disentangled into a single latent dimension.

In a TTO task, respondents are asked to trade length of life against a described health state. The resulting preference value is therefore not a manifestation of a single attribute, but an amalgam of several. At a minimum, TTO responses reflect attitudes toward longevity, attitudes toward the quality of the described health state, attitudes toward death, risk perception, time preference, loss aversion, and task comprehension. None of these components is separable within the elicited value. A change in a TTO score cannot be uniquely attributed to a change in “health quality,” because it may equally reflect a change in willingness to sacrifice life years, fear of death, or discounting of future time.

This multidimensionality is structural, not incidental. It arises from the very design of the TTO task. By construction, the respondent is asked to consider two qualitatively different attributes—time and health state—and to make a judgment that balances them. The output is therefore a trade-off function, not a measure of a single trait. Even if all respondents perfectly understood the task and responded consistently, the resulting numbers would still represent a compound preference relation rather than variation along one dimension.

Attempts to treat TTO values as unidimensional often rest on a category error: confusing a single numerical output with a single underlying attribute. Producing one number does not guarantee unidimensionality. A composite index can always be expressed as a scalar, but that scalar does not correspond to a single empirical dimension unless the contributing attributes are demonstrably aligned. In TTO, there is no empirical or theoretical basis for claiming such alignment. Indeed, empirical evidence routinely shows that TTO responses vary systematically with factors unrelated to health state severity, such as age, framing, time horizon, and cultural attitudes toward death.

From a measurement perspective, unidimensionality is not something that can be assumed; it must be tested. In latent trait measurement, models such as Rasch explicitly evaluate whether responses conform to a single latent dimension. TTO methods provide no such test. They simply assume that the elicited preference reflects a single quantity called “utility,” even though the task embeds multiple dimensions by design.

In health technology assessment, treating TTO preferences as unidimensional enables further illegitimate steps, including treating the resulting scores as interval or ratio measures and multiplying them by time. Once unidimensionality fails, these downstream operations lose any claim to meaning. The falsity of the statement that TTO preferences are unidimensional is therefore not a minor technical point. It exposes a foundational flaw: TTO outputs are not measures of a single attribute at all, but context-dependent preference constructions masquerading as quantities.

Ratio measures can have negative values FALSE

Statement: “Ratio measures can have negative values.”
Classification: FALSE

Ratio measures cannot have negative values, and this follows directly from the defining properties of a ratio scale. A ratio scale is characterized by two essential features: equal intervals and a true zero. The true zero is not a conventional reference point; it represents the complete absence of the attribute being measured. Because zero denotes absence, values on a ratio scale are bounded below by zero. Negative values are therefore conceptually and mathematically impossible.

The presence of a true zero is what distinguishes ratio scales from interval scales. On a ratio scale, zero is fixed by the empirical structure of the attribute itself. Zero length means no length, zero mass means no mass, zero duration means no time, and zero count means nothing is present. Once the absence of the attribute is reached, there is no further decrement possible. The scale cannot extend below zero without violating its empirical meaning. Negative length, negative mass, or negative time do not represent lesser amounts of the attribute; they represent conceptual contradictions.

This constraint is not relaxed by mathematical convenience or modeling practice. While mathematics allows negative numbers in the abstract, representational measurement theory restricts their use to situations where they correspond to meaningful empirical relations. For ratio scales, negative numbers do not correspond to any possible state of the attribute. As a result, they are excluded by definition. Allowing negative values would destroy the interpretability of ratios, because ratios rely on zero as a meaningful origin. If negative values were permitted, statements such as “twice as much” or “half as much” would lose coherence.

Interval scales, by contrast, can and often do include negative values precisely because they lack a true zero. Temperature measured in Celsius or Fahrenheit can take negative values because zero does not represent the absence of temperature; it represents an arbitrary point on the scale. Shifting the zero point does not change the meaning of differences, which is why interval scales permit both positive and negative numbers. This is exactly what ratio scales do not permit. Their zero point is fixed and non-arbitrary, which precludes negative values.

In health technology assessment, the confusion between interval and ratio scales is pervasive, and it leads directly to the mistaken acceptance of negative “ratio” values. Utility scores derived from preference elicitation methods are sometimes allowed to fall below zero and are then treated as if they were ratio measures. This is internally inconsistent. A quantity that admits negative values cannot, by definition, be a ratio measure, because the presence of negative values signals the absence of a true zero.

The claim that ratio measures can have negative values is therefore false in a categorical sense. It is not a matter of degree or approximation. A scale that permits negative values has already abandoned the defining property that makes ratio arithmetic meaningful. Recognizing this boundary is essential, because once negative values are admitted, any subsequent multiplication, division, or ratio comparison becomes uninterpretable.

EQ-5D-3L preference algorithms create interval measures FALSE

Statement: “EQ-5D-3L preference algorithms create interval measures.”
Classification: FALSE

EQ-5D-3L preference algorithms do not create interval measures. They generate preference scores that at best preserve ordinal information and, in many cases, do not even satisfy the minimal requirements for interval scaling. Treating these outputs as interval measures is a category error that arises from confusing numerical estimation with measurement.

An interval scale requires more than numerical spacing. It requires that equal numerical differences correspond to equal differences in the underlying attribute across the entire scale. This property must be justified empirically, not assumed. Interval measures also require invariance: the meaning of a unit difference must be independent of the particular items, respondents, or estimation sample used. EQ-5D-3L preference algorithms satisfy none of these conditions.

The EQ-5D-3L descriptive system is explicitly multidimensional, combining mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Preference algorithms then assign weights to levels within each dimension based on population elicitation exercises, typically using time trade-off or related methods. The resulting index value is a weighted sum of responses across dimensions. Producing a single number does not create an interval measure. It merely produces a composite score. Without evidence that the weighted combination represents variation along a single underlying attribute with equal intervals, interval status cannot be claimed.

Crucially, the algorithms impose interval structure rather than discovering it. The spacing between health states is determined by regression coefficients estimated from preference data that themselves lack interval properties. The model outputs are therefore artifacts of the chosen functional form, anchoring rules, and sample characteristics. A different elicitation method, valuation protocol, or population produces a different set of coefficients and a different numerical scale. This dependence on estimation context violates the invariance requirement of interval measurement.

The problem is compounded by the properties of the underlying preference data. Time trade-off responses do not generate interval-scale observations, as they do not satisfy unidimensionality or possess a true zero. Aggregating and modeling such data cannot magically upgrade their scale properties. Statistical estimation does not transform ordinal or mixed-attribute preferences into interval measures. It only fits numbers to responses.

Contrast this with legitimate interval measurement in the human sciences, which requires explicit testing of scale properties, often through models such as Rasch that enforce unidimensionality and invariant item calibration. EQ-5D-3L algorithms do not test whether equal differences in index scores correspond to equal differences in any empirical attribute. They assume it, because the downstream arithmetic of QALYs requires it.

In health technology assessment, labeling EQ-5D-3L preference scores as interval measures enables subtraction, averaging, and multiplication that would otherwise be impermissible. But this labeling is rhetorical, not scientific. Without demonstrated equal-interval properties and invariance, EQ-5D-3L preference algorithms do not create interval measures. They create numerically convenient indices, and the distinction matters because arithmetic legitimacy depends on it.

The QALY is a ratio measure FALSE

Statement: “The QALY is a ratio measure.”
Classification: FALSE

The QALY is not a ratio measure, and it fails the defining requirements of ratio measurement in multiple, independent ways. A ratio measure must possess equal intervals and a true zero, and it must represent variation along a single, well-defined attribute. The QALY satisfies none of these conditions. Its continued treatment as a ratio measure in health technology assessment reflects institutional convention rather than measurement logic.

The QALY is constructed by multiplying time, which is a genuine ratio measure, by a utility score intended to represent health-related quality of life. The scale properties of the product cannot exceed the weakest component. Even if time is ratio-scaled, the utility component is not. Preference-based utility scores lack a true zero, are not demonstrably interval-scaled, and do not represent a single unidimensional attribute. Multiplying time by such a score does not create a ratio measure; it contaminates a ratio quantity with a non-measure.

The absence of a true zero is decisive. For the QALY to be a ratio measure, zero QALYs would have to represent the complete absence of the attribute being measured. In practice, zero QALYs corresponds to zero time, not zero “health.” The utility component does not have a true zero, as its zero point is arbitrarily defined through anchoring conventions such as “dead = 0.” These anchors are not empirically grounded absences of health, and they can vary across valuation protocols and populations. A quantity whose zero depends on convention rather than absence cannot support ratio interpretation.

The QALY also fails unidimensionality. It purports to represent a single quantity of “health,” but in fact conflates duration with a multidimensional, preference-weighted index of health states. Duration and health state quality are distinct attributes. Combining them does not yield a single underlying dimension; it yields a composite. Composite quantities cannot be ratio measures because changes in the composite do not correspond to changes in any single attribute.

The existence of negative QALYs further exposes the category error. In some valuation systems, utility scores are allowed to fall below zero, implying “states worse than dead.” When such values are multiplied by time, the result is a negative QALY. A ratio measure cannot take negative values, because zero represents absence of the attribute. The mere possibility of negative QALYs is sufficient to refute the claim that QALYs are ratio-scaled.

Finally, ratio measures permit meaningful ratio statements. If the QALY were a ratio measure, it would be meaningful to say that one intervention produces twice as much health as another. Such statements are routinely implied in cost-per-QALY calculations, but they are indefensible. Because the underlying utility differences are not equal-interval and lack a true zero, the ratios of QALYs do not correspond to ratios of any empirical attribute.

The claim that the QALY is a ratio measure is therefore false in a categorical sense. It is not approximately false or philosophically questionable; it is mathematically impossible. The QALY is a constructed index whose numerical properties are assumed for convenience. Treating it as a ratio measure enables arithmetic that has no empirical meaning, and once that is recognized, the central quantitative pillar of cost-effectiveness analysis collapses.

Time is a ratio measure TRUE

Statement: “Time is a ratio measure.”
Classification: TRUE

Time is a ratio measure because it satisfies all the defining requirements of ratio-scale measurement. It possesses equal intervals, a true zero, and invariant units, and it supports meaningful multiplication, division, and ratio comparisons. These properties are not matters of convention within health technology assessment; they are grounded in the empirical structure of time itself and have been recognized across the physical sciences for centuries.

The defining feature of a ratio scale is the existence of a true zero that represents the complete absence of the attribute being measured. For time, zero denotes no duration. A time interval of zero seconds means that no time has elapsed. This is not an arbitrary reference point that can be shifted without consequence. It is fixed by the empirical meaning of duration. There is no meaningful sense in which a negative duration exists in the empirical world. This fixed zero anchors the scale and allows ratios to be interpreted.

Time also has equal intervals. One second represents the same duration regardless of when it occurs or what is being timed. The difference between one and two seconds is empirically equivalent to the difference between ten and eleven seconds. This invariance of unit size is essential for arithmetic. It ensures that addition and subtraction correspond to the concatenation or removal of equal durations. Without equal intervals, even simple summation would be meaningless.

Because time has a true zero and equal intervals, multiplication and division are meaningful. It is coherent to say that one event lasted twice as long as another, or that an intervention extended life by three times the duration achieved by a comparator. Ratios of time correspond to ratios in the underlying attribute. This is exactly what ratio-scale measurement entails. Time therefore supports all standard arithmetic operations without violating measurement axioms.

The ratio-scale properties of time are preserved across different units of measurement. Seconds, minutes, hours, and years are related by constant multiplicative transformations. Changing units rescales the numbers but does not alter their ratios. An event that lasts two hours is twice as long as one that lasts one hour, just as an event that lasts 120 minutes is twice as long as one that lasts 60 minutes. This invariance under multiplicative transformation is a hallmark of ratio scales.

In health technology assessment, time is often invoked correctly as a ratio measure, particularly when discussing survival, duration of treatment, or length of follow-up. Problems arise not because time lacks ratio properties, but because time is combined with quantities that do not share those properties. Multiplying time by non-ratio measures does not confer ratio status on the product. The ratio nature of time is not contagious; it cannot rescue illegitimate arithmetic.

Affirming that time is a ratio measure is therefore uncontroversial and foundational. It underscores the asymmetry at the heart of HTA arithmetic: one component of the QALY construction is lawful, the other is not. Recognizing this distinction is essential, because it clarifies that the problem lies not with the use of time, but with what is done to it.

Measurement precedes arithmetic TRUE

Statement: “Measurement precedes arithmetic.”
Classification: TRUE

Measurement must precede arithmetic because arithmetic operations are meaningful only when applied to quantities whose measurement properties are already established. This is a foundational principle of representational measurement theory and of mathematics as it is applied to the empirical world. Numbers do not acquire meaning simply by being manipulated; their meaning derives from the prior mapping between numbers and attributes. Without that mapping, arithmetic is not analysis but symbol manipulation detached from reality.

Measurement answers a logically prior question: what kind of thing is being represented? Before any arithmetic can be performed, it must be known whether the attribute admits ordering, equal intervals, or a true zero. These properties determine which operations are permissible. Ordinal scales permit ranking but not addition. Interval scales permit addition and subtraction but not multiplication. Ratio scales permit the full range of arithmetic operations. Arithmetic does not determine scale type; scale type determines arithmetic. Reversing this order destroys interpretability.

In all mature sciences, this ordering is taken for granted. Physicists do not multiply quantities until they have established what is being measured and on what scale. Engineers do not divide by variables whose dimensional properties are unknown. Units analysis, dimensional consistency, and scale properties are enforced before computation, not retrofitted afterward. Arithmetic is subordinate to measurement, not the other way around.

The error in health technology assessment is precisely the inversion of this logic. HTA begins with desired arithmetic outputs—cost-effectiveness ratios, aggregated QALYs, lifetime model results—and then assigns scale properties to inputs as needed to justify those operations. Utilities are treated as interval or ratio measures because the arithmetic requires them to be so, not because their measurement properties have been demonstrated. This is not measurement; it is rationalization.

Statistical estimation does not repair this inversion. Regression coefficients, preference weights, or model parameters do not create measurement properties. They merely produce numbers that conform to a chosen functional form. Without prior demonstration that the underlying variables possess the scale properties required for the intended arithmetic, the results remain uninterpretable. Arithmetic performed first and justified later cannot recover meaning.

The principle that measurement precedes arithmetic also explains why sensitivity analysis and uncertainty ranges cannot rescue invalid models. Varying inputs across plausible ranges does not address whether the operations themselves are lawful. One cannot test the robustness of an illegitimate multiplication. If the quantities are not measures, no amount of arithmetic refinement can make the results meaningful.

Affirming that measurement precedes arithmetic is therefore not methodological pedantry. It is the condition that separates science from numerology. Once this ordering is violated, numbers cease to represent attributes and become mere artifacts of calculation. Recognizing this principle forces a reckoning in HTA: many familiar calculations are not slightly flawed, but logically prohibited. Arithmetic can only follow measurement. When it leads instead, the result is not evidence, but illusion.