Book Review of _Applying the Rasch Model: Fundamental Measurement in the Human Sciences

Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 11 No. 1. Mar. 2007. (p. 17 - 19) [ISSN 1881-5537]

Book Review

Applying the Rasch Model:
Fundamental Measurement in the Human Sciences^*
by Trevor G. Bond and Christine M. Fox (2002)
Mahwah, NJ: Lawrence Erlbaum Associates

When researchers in the physical sciences conduct experiments, they use measurement instruments which are generally regarded as reliable, objective, and independent of the researcher, and in which the units of whatever is being measured have equal intervals. To ascertain height, weight, time, or temperature, for example, scales, clocks, or thermometers which have been carefully calibrated over long periods of time are used. The authors of this book contend that this is generally lacking in the human sciences. In the first chapter, "Why measurement is fundamental," Bond and Fox begin by noting that the definition of measurement in the human sciences differs from that of the physical sciences, and is based on Stevens' (1946) definition of measurement (quoted in Michell, 1997) as the "assignment of numerals to objects or events according to a rule." They criticize this definition because it has led psychometricians to regard the mere assignment of numerical values to a trait as scientific measurement, without regard to the quality of the measurement instrument. For example, all too often ordinal data derived from Likert scales is treated as if it were interval level data during statistical analyses. Bond and Fox argue that researchers are often so engrossed in their statistical analyses that they forget to pay adequate heed to the measurement process itself (too much "psycho" and not enough "metrics" [p. 189]). This can lead to false conclusions. Bond and Fox argue that in order to advance the human sciences, it is necessary to develop the same rigorous standards of measurement that are used in the physical sciences.

The authors then introduce the theory of Rasch analysis, arguing that this is the best existing tool for the rigorous measurement requirements stated above. In Rasch analysis, ordinal data, such as raw test scores or Likert scale observations, are converted into interval scale data through logarithmic transformations. This enables the direct comparison of test item difficulties and test-taker abilities on a common measurement scale.

The book is divided into thirteen chapters and two appendices, which give the mathematical formulas for the Rasch family of models and provide a list of Rasch software programs, publications, websites, and professional organizations. The first three chapters present the basic theory and principles of Rasch analysis. Introduced are such concepts as fundamental measurement, unidimensionality, item/person invariance, conjoint additive measurement, and the invariant order of developmental stages, all of which are central to Rasch theory. These chapters also describe the ways in which Rasch analysis deals with issues of construct validity and reliability through statistics such as fit measurement.

^* This is a review of the original edition published in 2001. A revised edition is scheduled for publication the spring of 2007.

[ p. 17 ]

Chapters 4 through 8 deal with practical applications of Rasch analysis, featuring different members of the Rasch family of models. Chapter 4 describes the basic dichotomous model, for tests which have yes/no or right/wrong answers. Chapter 5 discusses test equating, an operation which can be performed to determine if two tests are measuring the same construct. Chapters 6 through 8 then show how extensions of the basic dichotomous model can be used to analyze polytomous data (e.g. data in which more than two choices are present). Chapter 6 demonstrates the rating scale model, commonly used with Likert scales, and Chapter 7 describes the partial credit model, which deals with test items which can be marked partially correct, or tests which contain combinations of different items types, with different numbers of steps. This model is often used with oral interviews essay tests. Chapter 8 introduces the many facets model, a model which is used with performance-based tests, such as essays, in which the facet of rater severity is added to the facets of item difficulty and person ability.

In these chapters, following the basic descriptions, the models are then illustrated with specific examples of practical application, such as the BLOT (Bond's Logical Operations Test) (Bond, 1976/1995) for the dichotomous model, which tests adolescents' cognitive development according to the theories of Piaget, and the Computer Opinion Survey (Simonson, M. R., Maurer, M., Montag-Torardi, M., & Whitaker, M., 1987) for the rating scale model, a Likert scale measure of computer anxiety. Aside from the tables and figures common in the output produced by Rasch software programs, Bond and Fox have also developed a dynamic visual version of the Rasch scale, in which items are represented by circles, test-takers by squares, and misfit by the placement of misfitting items and persons in gray areas along the sides of the scale. This graphic chart is particularly useful in understanding how Rasch output illustrates various aspects of the measurement scale for persons and items and has since been incorporated by Linacre into the latest version of Winsteps (Linacre, 2006). Chapters 4 through 7 end with examples of software commands from Quest and Winsteps, two of the most common Rasch analysis software programs.

The final chapters further investigate theoretical and practical aspects of Rasch analysis. Chapter 9 explores the notion of stage-based development (in which an interval scale created by the model maintains equal measurement units all along the scale, and moving up the scale indicates more of the trait or construct under investigation, or crossing thresholds to the next level of development), while Chapter 10 presents various examples from the human sciences in which the Rasch model has been applied, including such diverse areas as public health surveys, computer adaptive testing, and judged sports performances. Chapter 11 expands on Chapter 6 with a more detailed discussion of problems with the construction, design, and analyses of rating scales. As in Chapter 6, this chapter also ends with a specific example and software commands. Chapter 12 deals with model fit, and explains one of the central concepts of Rasch measurement that differentiates it from Item Response Theory - the notion that the data should fit the model rather than that the model should fit the data. The authors also show how fit statistics can be used to investigate individual cases of misfit. For example, if a highly capable student unexpectedly gets a block of easy items wrong, that student will be flagged as misfitting the model; investigating that student's case might reveal, say, that the student was absent during a period when that material was taught. Finally, Chapter 13 sums things up and reiterates the arguments the authors have made in support of the application of Rasch measurement to the human sciences.

[ p. 18 ]

Throughout the book, the authors emphasize the importance of basing a Rasch investigation on a substantive theory which underlies or informs the research, and that the theory-practice dialogue is an ongoing process: researchers have to consider what the test results tell them about the underlying theory, and what the theory tells the researcher about the test items and test-takers under investigation.

Applying the Rasch Model is a readable and user-friendly introduction to Rasch analysis. Accessible to the mathematically challenged and not requiring a technical background, it is a useful book for those who want to learn about the theory and practice of Rasch measurement, and those who may know something about it but want to reinforce their knowledge.

- Reviewed by Ed Schaefer
Ochanomizu University

References

Adams, R. J. & Khoo, S. T. (1993). Quest: The interactive test analysis system [Computer Software]. Camberwell, Victoria: Australian Council for Educational Research.

Bond, T. G. (1976/1995). BLOT-Bond's logical operations test. Townsville, Queensland, Australia: James Cook University.

Linacre, J. M. (2006). WINSTEPS (Version 3.63.0) [Computer Software]. Chicago, Il: MESA Press.

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88(3), 355-383.

Simonson, M. R., Maurer, M., Montag-Torardi, M., & Whitaker, M. (1987). Development of a standardized test of computer literacy and a computer anxiety index. Journal of Educational Computing Research, 3(2), 231-247.

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.

NEWSLETTER: Topic Index

Author Index

Title Index

Date Index
TEVAL SIG: Main Page

Background

Links

Network

Join

http://jalt.org/test/sch_1.htm

[ p. 19 ]

Shiken: JALT Testing & Evaluation SIG Newsletter Vol. 11 No. 1. Mar. 2007. (p. 17 - 19) [ISSN 1881-5537]

Book Review

References

http://jalt.org/test/sch_1.htm

Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 11 No. 1. Mar. 2007. (p. 17 - 19) [ISSN 1881-5537]