Distance Education
INTRODUCTION
For fifty years, health technology assessment has practiced numerical storytelling by confusing numbers with measures. To function as a science, HTA must accept the axioms of representational measurement theory (RMT): following Stevens (1946), who tied arithmetic to scale type, and completed by Krantz, Luce, Suppes, and Tversky (1971) with representation and uniqueness theorems. In parallel, Rasch (1960) supplied the probabilistic bridge for latent traits; Wright (1977) showed how ordered responses can be transformed into a logit ruler with specific objectivity when the model fits. HTA could have adopted these foundations at any time; instead, fixation on QALYs and the valuation of multiattribute health-state descriptions, contrary to the requirement of unidimensionality and the other axioms of RMT, guaranteed comprehensive measurement failure.
Health technology assessment’s choice to value health-state descriptions, an approach that persists, has ensured that therapy-impact claims built on utilities, QALYs, and reference-case models fail basic measurement standards. By bypassing the axioms that license arithmetic (unidimensionality, additivity, solvability, the Archimedean property, cancellation, and invariance), the field forfeits dimensional homogeneity and any interval or ratio meaning. The essential point is simple: only when an empirical system satisfies these axioms can observations be mapped to numbers that function as measures and support falsifiable claims.
Two Programs are available; these will be followed by further programs on the implementation of protocols to support specific therapy impact claims, both objective physical claims (e.g., resource utilization) and latent traits (e.g., need fulfillment) and the development of formulary submission guidelines for new therapies. Each of the two programs comprises 5 modules each with supporting questions and answers. Significant input from colleagues have encouraged a modular format so these materials can support graduate instruction, faculty seminars, and focused discussions on measurement failure and reform. Zoom seminars can be arranged on request. Participants are recommended to download the material; this will amount to just over 100 pages for each program.
Dr Langley, the author of these programs, is an economist. He received his undergraduate training in the UK and his M.A and Ph.D in Canada. He has taught microeconomics, labor economics and health economics in the UK, Canada, Australia and the United States. His main interest is in measurement theory; the application of representational measurement for health system claims for therapy impact. He holds an Adjunct Professor position in the College of Pharmacy, University of Minnesota and is Director of Maimon Research (www.maimonresearch.com ) a boutique consulting company in health technology assessment. He is based in Tucson, Arizona. Please direct communications to: langleylapaloma@gmail.com regarding these two programs.
The link to registration and payment (US$65.00) for a program is provided at the end of each program description.
PROGRAM 1
NUMERICAL STORYTELLING: SYSTEMATIC MEASUREMENT FAILURE IN HEALTH TECHNOLOGY ASSESSMENT
HTA can be dismissed in a sentence: it confuses numbers with measures. In science, a string of numerals becomes a measure only when it preserves the empirical structure of an attribute and obeys the transformation rules set out by representational measurement theory. Those axioms, order, additivity, solvability/cancellation, invariance, are what license arithmetic. Without them, subtraction, averaging, ratios, and products are illegitimate. HTA’s main artifacts ignore this gate. Utilities derived from preference tasks lack interval meaning; multiplying them by time to make QALYs violates dimensional homogeneity; disease-specific totals are summed scores that have never earned equal units; cost composites bundle heterogeneous quantities. Rasch modeling shows how latent attributes can be measured lawfully, but HTA rarely demands it. The result is numerical storytelling dressed as evaluation: outputs that look precise yet have no admissible arithmetic. Until HTA requires evidence that its numbers are measures, its claims are not science but policy theater.
MODULE 1: WHY STEVENS? THE CONTEXT OF 1946
Before 1946, measurement outside physics lacked a clear warrant. Campbell addressed additivity
for manifest quantities; psychophysics mapped sensations; operationalism equated meaning with
procedure. None guaranteed that numerals preserved empirical structure or justified arithmetic,
particularly for latent traits. Stevens resolved this by linking scale types—nominal, ordinal,
interval, ratio—to permissible arithmetic and statistics. He did not provide a method for
constructing invariant rulers for latent attributes. That gap was later filled by the axiomatic work
in Foundations of Measurement and by Rasch modeling as an operational solution.
MODULE 2: AXIOMS OF REPRESENTATIONAL
MEASUREMENT THEORY
Between 1946 and 1971, measurement theory advanced from typology to formal axioms. Suppes
derived additivity from concatenation for extensive attributes. Luce and Tukey showed how
conjoint measurement yields additive representations without concatenation under conditions
such as cancellation and solvability. Krantz, Luce, Suppes, and Tversky unified these results,
proving representation and uniqueness theorems. In parallel, Rasch modeling provided a
probabilistic implementation for measuring latent traits when data conform to the model.
MODULE 3: SUSTAINED MEASUREMENT FAILURE:
TTO, EQ-5D-3L, AND PREFERENCE UTILITIES
Time trade-off entrenched measurement failure by valuing multiattribute health-state
descriptions. TTO outputs are regressed into preference algorithms to generate utilities,
producing numbers without satisfying unidimensionality, additivity, or invariance. Protocol
variation and country-specific tariffs further destroy invariance, while task artifacts generate
negative values. Because the axioms fail at inception, these utilities are non-measures and cannot
support arithmetic.
MODULE 4: SUSTAINED MEASUREMENT
FAILURE—THE QALY AND THE REFERENCE CASE
The QALY multiplies time, a ratio measure, by utilities that are not measures. The resulting
construct lacks unidimensionality, equal units, invariance, and a true zero. Reference-case
modeling institutionalizes this error by mandating cost-per-QALY outputs and treating them as
evidence. Thresholds and sensitivity analyses add precision without meaning. The reference case
thus formalizes numerical storytelling while ignoring fundamental measurement requirements.
MODULE 5: THE IDENTITY CRISIS OF
HTA—NOTHING WITHOUT THE REFERENCE CASE
HTA’s identity crisis arises because the reference case treats numbers as measures without
satisfying RMT axioms. Utilities are multiplied by time to form QALYs, violating scale
requirements. With a denominator that is not a measure, cost-per-QALY ratios lack a stable unit
and cannot be falsified. Checklists enforce format rather than measurement validity, leaving HTA
as ritual rather than science. Remove the reference case and HTA has little to offer if objective
knowledge is the goal.
PROGRAM 2
A NEW START IN MEASUREMENT FOR HEALTH
TECHNOLOGY ASSESSMENT
For fifty years, HTA has confused numbers with measures. Scientific practice requires adherence
to representational measurement theory, from Stevens’ linkage of arithmetic to scale type to the
representation and uniqueness theorems of Krantz, Luce, Suppes, and Tversky. Rasch provided
the operational bridge for latent traits, transforming ordered responses into an invariant logit
ruler. HTA instead fixated on QALYs and multiattribute valuations, guaranteeing persistent
failure. The remedy is non-negotiable: only linear ratio scales for manifest claims and Rasch
logit ratio scales for latent trait possession are admissible.
MODULE 1: THE DENIAL OF FALSIFICATION IN HTA
Falsification requires stable units, replicability, and clear disconfirmation conditions. RMT
supplies these prerequisites. HTA denies falsification because its core quantities are not
measures: ordinal utilities are treated as interval and multiplied by time into QALYs. Reference-
case models embed these non-measures, producing outputs that cannot be empirically refuted.
Without validated units, HTA abandons normal science.
MODULE 2: THE RASCH MODEL – LATENT TRAITS
AND ITEM SELECTION
Latent traits are measurable only if they yield invariant scales. Rasch specifies a single trait, tests
items against it, and maps responses via a logistic function of person location minus item
difficulty. Items must fit the model to ensure constant relative differences and invariance. Item selection is critical: well-targeted items span the trait and cluster near 50% endorsement. Misfit
signals failure and functions as falsification.
MODULE 3: THE RASCH MODEL – THE UNIQUE
LOGIT RATIO SCALE
Rasch measurement operationalizes conjecture and refutation through fit statistics, local
independence, DIF, and invariance testing. Surviving these tests earns the status of “measure.”
The result is the unique Rasch logit ratio scale: additive logits defining trait possession and
multiplicative odds with a true zero, where differences translate to invariant odds ratios.
MODULE 4: THE RASCH MODEL – POSSESSION AND
FALSIFICATION
The quantity of interest is possession of a single latent trait, measured in logits. When Rasch
assumptions hold, equal logit differences have equal meaning across persons and items.
Estimation yields person and item locations with standard errors governed by targeting. Group
comparisons and change analyses are conducted on the logit scale, with odds-ratio
interpretations. For subjective responses, Rasch is the only framework consistent with RMT.
MODULE 5: THE RASCH MODEL – THE EXISTENTIAL
CRISIS FOR DISEASE-SPECIFIC INSTRUMENTS
Disease-specific instruments used in HTA rarely meet Rasch standards. Summed scores lack
unidimensionality, invariance, and additivity, so totals are not measures. Apparent change often
reflects instrument behavior rather than trait change. The Rasch model uniquely enforces
representational measurement for latent traits; instruments that fail its tests produce numerical
storytelling, not evidence.
