Book Review
Applying the Rasch Model:
Fundamental Measurement in the Human Sciences*
by Trevor G. Bond and Christine M. Fox (2002)
Mahwah, NJ: Lawrence Erlbaum Associates
When researchers in the physical sciences conduct experiments, they use measurement instruments which
are generally regarded as reliable, objective, and independent of the researcher, and in which the units of whatever is being
measured have equal intervals. To ascertain height, weight, time, or temperature, for example, scales, clocks, or thermometers
which have been carefully calibrated over long periods of time are used. The authors of this book contend that this is generally
lacking in the human sciences. In the first chapter, "Why measurement is fundamental," Bond and Fox begin by noting that the
definition of measurement in the human sciences differs from that of the physical sciences, and is based on Stevens' (1946)
definition of measurement (quoted in Michell, 1997) as the "assignment of numerals to objects or events according to a rule."
They criticize this definition because it has led psychometricians to regard the mere assignment of numerical values to a trait
as scientific measurement, without regard to the quality of the measurement instrument. For example, all too often ordinal data
derived from Likert scales is treated as if it were interval level data during statistical analyses. Bond and Fox argue that
researchers are often so engrossed in their statistical analyses that they forget to pay adequate heed to the measurement process
itself (too much "psycho" and not enough "metrics" [p. 189]). This can lead to false conclusions. Bond and Fox argue that in
order to advance the human sciences, it is necessary to develop the same rigorous standards of measurement that are used in the
physical sciences.
The authors then introduce the theory of Rasch analysis, arguing that this is the best existing tool for the rigorous measurement
requirements stated above. In Rasch analysis, ordinal data, such as raw test scores or Likert scale observations, are converted
into interval scale data through logarithmic transformations. This enables the direct comparison of test item difficulties and
test-taker abilities on a common measurement scale.
The book is divided into thirteen chapters and two appendices, which give the mathematical formulas for the Rasch family of models
and provide a list of Rasch software programs, publications, websites, and professional organizations. The first three chapters
present the basic theory and principles of Rasch analysis. Introduced are such concepts as fundamental measurement, unidimensionality,
item/person invariance, conjoint additive measurement, and the invariant order of developmental stages, all of which are central to
Rasch theory. These chapters also describe the ways in which Rasch analysis deals with issues of construct validity and reliability
through statistics such as fit measurement.
* This is a review of the original edition published in 2001. A revised edition is scheduled for publication the spring of 2007.
[
p. 17
]
Chapters 4 through 8 deal with practical applications of Rasch analysis, featuring different members of the Rasch family of models. Chapter 4
describes the basic dichotomous model, for tests which have yes/no or right/wrong answers. Chapter 5 discusses test equating, an operation
which can be performed to determine if two tests are measuring the same construct. Chapters 6 through 8 then show how extensions of the
basic dichotomous model can be used to analyze polytomous data (e.g. data in which more than two choices are present). Chapter 6 demonstrates
the rating scale model, commonly used with Likert scales, and Chapter 7 describes the partial credit model, which deals with test items which
can be marked partially correct, or tests which contain combinations of different items types, with different numbers of steps. This model is
often used with oral interviews essay tests. Chapter 8 introduces the many facets model, a model which is used with performance-based tests,
such as essays, in which the facet of rater severity is added to the facets of item difficulty and person ability.
In these chapters, following the basic descriptions, the models are then illustrated
with specific examples of practical application, such as the BLOT (Bond's Logical Operations Test) (Bond, 1976/1995) for the dichotomous model,
which tests adolescents' cognitive development according to the theories of Piaget, and the Computer Opinion Survey (Simonson, M. R., Maurer, M.,
Montag-Torardi, M., & Whitaker, M., 1987) for the rating scale model, a Likert scale measure of computer anxiety. Aside from the tables and figures
common in the output produced by Rasch software programs, Bond and Fox have also developed a dynamic visual version of the Rasch scale, in which
items are represented by circles, test-takers by squares, and misfit by the placement of misfitting items and persons in gray areas along the sides
of the scale. This graphic chart is particularly useful in understanding how Rasch output illustrates various aspects of the measurement scale for
persons and items and has since been incorporated by Linacre into the latest version of Winsteps (Linacre, 2006). Chapters 4 through 7 end with
examples of software commands from Quest and Winsteps, two of the most common Rasch analysis software programs.
The final chapters further investigate theoretical and practical aspects of Rasch
analysis. Chapter 9 explores the notion of stage-based development (in which an interval scale created by the model maintains equal measurement
units all along the scale, and moving up the scale indicates more of the trait or construct under investigation, or crossing thresholds to the
next level of development), while Chapter 10 presents various examples from the human sciences in which the Rasch model has been applied, including
such diverse areas as public health surveys, computer adaptive testing, and judged sports performances. Chapter 11 expands on Chapter 6 with a more
detailed discussion of problems with the construction, design, and analyses of rating scales. As in Chapter 6, this chapter also ends with a
specific example and software commands. Chapter 12 deals with model fit, and explains one of the central concepts of Rasch measurement that
differentiates it from Item Response Theory - the notion that the data should fit the model rather than that the model should fit the data.
The authors also show how fit statistics can be used to investigate individual cases of misfit. For example, if a highly capable student
unexpectedly gets a block of easy items wrong, that student will be flagged as misfitting the model; investigating that student's case might reveal,
say, that the student was absent during a period when that material was taught. Finally, Chapter 13 sums things up and reiterates the arguments
the authors have made in support of the application of Rasch measurement to the human sciences.
[
p. 18
]
Throughout the book, the authors emphasize the importance of basing a Rasch investigation on a substantive theory which underlies or informs the
research, and that the theory-practice dialogue is an ongoing process: researchers have to consider what the test results tell them about the
underlying theory, and what the theory tells the researcher about the test items and test-takers under investigation.
Applying the Rasch Model is a readable and user-friendly introduction to Rasch analysis. Accessible to the mathematically challenged
and not requiring a technical background, it is a useful book for those who want to learn about the theory and practice of Rasch measurement, and
those who may know something about it but want to reinforce their knowledge.
- Reviewed by Ed Schaefer
Ochanomizu University
References
Adams, R. J. & Khoo, S. T. (1993). Quest: The interactive test analysis system [Computer Software]. Camberwell, Victoria: Australian Council for Educational Research.
Bond, T. G. (1976/1995). BLOT-Bond's logical operations test. Townsville, Queensland, Australia: James Cook University.
Linacre, J. M. (2006). WINSTEPS (Version 3.63.0) [Computer Software]. Chicago, Il: MESA Press.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355-383.
Simonson, M. R., Maurer, M., Montag-Torardi, M., & Whitaker, M. (1987). Development of a standardized test of computer literacy and a computer anxiety index. Journal of
Educational Computing Research, 3(2), 231-247.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.