JALT Testing & Evaluation SIG Newsletter
Vol. 11 No. 1. Mar. 2007. (p. 21 - 25) [ISSN 1881-5537]

An Interview with Trevor G. Bond

by Edward Schaefer

Trevor G. Bond is head of the Department of Psychology, Counselling, and Learning Needs at the Hong Kong Institute of Education, and was formerly a senior faculty member of the School of Education at James Cook University in Australia. He has a Ph.D. in developmental psychology. Along with Christine Fox, he coauthored the popular introduction to Rasch measurement, Applying the Rasch Model: Fundamental Measurement in the Human Sciences in 2001. He has held workshops in a number of countries to promote the use of Rasch measurement among researchers in various fields. This interview was conducted by e-mail in January 2007.


Q: I wonder if we could start with some background information about yourself. I understand you started out as a physical education teacher. How did you get involved with psychological and educational measurement?

A: I was a primary school teacher first – year 3, in fact; then secondary PE. When I upgraded my teaching training qualifications, I completed a research Honours degree, specializing in developmental psychology and the work of Jean Piaget. Although my thesis was quantitatively based, I always felt unhappy with the analytical techniques that were available at the time. They did not capture the essence of the research and while I completed a quantitative / qualitative validation, I stuck with the principles of Spearman's rho – because my Piagetian data was ordinal data. But even the high correlation value could say nothing about the links between my test and Piaget's stages. I then tried Guttman scaling and ordering theory, but to no avail. Luckily, I was finally attracted to Rasch measurement by Geoff Masters and Mark Wilson who worked at ACER at the time. They sat in on my paper on an Ordering theory analysis of Piagetian thinking abilities and told me that Rasch was the answer to my prayers, and they were right. Researchers who know their data really well are the best audience for a Rasch analysis demonstration. When you have relationships between items and persons so clearly portrayed, it is a real eye-opener. I knew there was more to my data, but only Rasch analysis could reveal it. I still pull the same trick on colleagues – they are content working on any old data set – but do an analysis of their much-loved and much analysed data. Kaboom. They don't go back. Although Rasch measurement was always a means to an end to me, the unexpected success of our 2001 book has made my interest in Rasch measurement a focus in its own right. Many of the analyses in that book come from work on Piagetian data – but the lessons are there for any person interested in measurement to see.

[ p. 20 ]


Q: You have taught in several different countries, and are now based in Hong Kong. I'm not sure if Hong Kong is similar to Japan, but here the testing culture is often referred to as a "culture of secrecy" and it's difficult to perform any analyses of high stakes tests such as university entrance exams. How do you find the situation in Hong Kong?

A: In Hong Kong, university entrance is a direct consequence of performance on high-stakes public graduation exams which are done at high school, so they are centrally administered by the exams authority (HKEAA). I am pleased to say they are well versed in the techniques of Rasch measurement, so secondary analysis is unnecessary and irrelevant in this case. But I have heard of some interesting tales from Japan where quite inferior quantitative analytical techniques are used, and as a consequence, poor decisions are made which must adversely affect the life chances of some students. Rigour and transparency would be my model in this case. Of course, many licencing and examining authorities make unjustified decisions: indefensible cut-points; too many student categories; using norm-based rather than competency-based reference points, etc. Rasch measurement can be cruelly revealing of such inadequacies. Don't expect the decision-makers to thank you – not unless they have had their own misgivings and are then open to a better way.

Q: In a lecture at Temple University on your previous trip to Japan, you stated that you thought we could take responsibility for testing away from institutional authorities, and give it back to classroom teachers. Can you explain what you meant by this?

A: It seems to me that there are many teachers of English in Japan who have an earnest and informed interest in what is really required for Japanese speakers of English to learn so that they can be successful. I am not sure whether an external testing company has that knowledge or even an interest, but they take away a lot of financial resources from the educational sector in the fees they charge for the tests. It is easy to regard the approach of some big international testing organizations as part of the MacDonaldization of educational assessment. That's okay if all you want is a Big Mac – just like the one you can buy anywhere. But more than just the relevant content and contextual knowledge, there is a growing capacity in Japan to develop these tests and to monitor their standards from a stringent psychometric viewpoint using Rasch analysis. It seems to me that testing in Japan would be better off in the hands of those who know what that testing should be for and why it should be used. That would also ensure that a lot of testing fees stay in Japan as well.

[ p. 21 ]

Q: In many classroom situations, teachers might say that objective measurement tools such as Rasch are not necessary. As teachers they know their students, and ordinary percentage scores are good enough; or if they want to do research, qualitative techniques are preferable. How would you respond to them?

A: There are many ways that a medical practitioner could take the temperature of a patient: hand-on-forehead, mercury thermometer in mouth, or high tech infra-red. You expect your medical practitioner to choose between the second and the third option. Why's that? Teachers need to be able to go further than mere 'sense impressions' of students' ability. They need to be able to measure abilities scientifically against their known professional standards in order to diagnose and treat (teach) effectively. The teaching / medical analogy is quite telling. Just make sure as teacher and patient, your preference is consistent. Nothing wrong with teacher / medico judgement per se – but most of us prefer a little evidence-based practice when it matters. In the daily practice of teaching and of medicine, it should always matter.

Q: In the preface to your 2001 book with Christine Fox, you state that "at the turn of the millennium, the human sciences, for those who are driven by quantitative research methods, are in a state of crisis." (p. viii) Can you explain what you mean by this?

A: It seems that after one hundred years of counting, of quantifying, most of us in the human sciences have just a bunch of numbers to show for it. Most of us don't yet have rigorously scientific measures for our key variables. Precise measurements of mass, temperature, time, length, angles and the like are the foundation of the physical sciences. Genuine interval-level measures are needed if the human sciences are to be regarded to be as really scientific. While our sophistication in quantitative analytical techniques has blossomed, we routinely perform those analyses on data which are, at best, ordinal – even though everyone knows that the techniques explicitly require interval level data. The assumption of interval nature is just routine and unquestioned – indeed those who question those presumptions are dismissed. Then, quite interestingly we don't suffice in calling this quantification, we insist on calling it measurement; thereby alluding to the cachet of scientific measurement and pretending to ourselves that we are really scientists.

Q: You also argue that objective measurement tools, such as the Rasch model, are not meant to replace statistical techniques but are a necessary prerequisite to them. Why is this important?

A: The answer follows from above. Rasch measurement provides the tools for the construction and control of measures; a sort of scientific quality control. The results are scales, questionnaires, tests, scoring rubrics which yield interval level measures – the sort of data explicitly required by many of our sophisticated data analysis techniques. The vast majority of these techniques require interval data; Rasch analysis is uniquely positioned to provide / fulfill that requirement.

[ p. 22 ]

Q: Many people have the (mis)conception that Rasch analysis is a special one parameter model case of Item Response Theory. Others raise their eyebrows at the notion that IRT and Rasch are related. Can you tell us the difference between IRT and Rasch analysis, and why you consider Rasch measurement superior?

A: Many of the data analytical steps in Rasch analysis and in IRT models derive from identical mathematical principles. Of course, these techniques are shared across what are called 'latent trait' models, others with 'logistic models'. So many see the Rasch model as a particular, over-simplified version of IRT models. Indeed, they say, it is too simple to be very useful in practical testing. That's not the issue for me. Rasch's model is an elegantly simple theorem about the necessary relationship that must exist between item and person performance in a testing situation before the property of interval level measurement may be asserted. To the extent that your data fulfill Rasch's requirements, then you have measurement 'good enough for government work' as Mike Linacre routinely quips. Rasch's theorem posits the standard, your task is to closely approximate it – the higher the stakes, the closer the approximation should be. No Pythagorean triangle actually exists, but close approximations of it have been used for all sorts of practical construction purposes. IRT models, on the other hand, do not provide such a standard. What is foremost is the data set – the analyst's job is to 'tweak' the parameters until the residuals are at their smallest. The data then can be explained or summarised by a particular model. Make your choice – do you want to describe the vagaries of your current data, or do you aim to measure something important about the human condition? If it's the latter, pick a strong measurement model and work toward fulfilling its simple, but stringent requirements until the data you collect produce measures good enough to support the decisions you need / want to make.

Q: How can a basic knowledge of Rasch measurement help researchers and practitioners in second language acquisition and applied linguistics?

A: It is interesting to see that since McNamara's book Measuring Second Language Performance came out in 1996, more people in second language and linguistic studies have become aware of the benefits of Rasch measurement. What else can model rater effects so parsimoniously? That model enables us to answer questions such as:

[ p. 23 ]

These are all questions that are fundamental to language practitioners and are best answered by Rasch measurement.

Q: : A new edition of your book is set to come out soon. What changes are there from the previous edition?

A: In fact, I just signed off on the proofs of this book just yesterday. We have attended to a bunch of niggling shortcomings of the first edition. We have made the practical work much more user-friendly and added a complete set of tutorials for readers to follow. I have since used those in a number of Rasch workshops; they were viewed very favourably by the participants. We opted for thermometry as a suitable analogy for measurement in the human sciences and have included new explanatory examples. The completely new Chapter 5 is devoted to the idea of measurement invariance as a fundamental principle of measurement, and as a consequence, a number of other chapters are reoriented to that central focus.

Q: Winsteps®, the software developed by Mike Linacre, is a popular package for Rasch analysis. It seems you and Christine Fox have developed a software program based on Winsteps® called Bond&FoxSteps. What led you to modify Professor Linacre's program? What differences are there between your program and his?

A: Bond&FoxSteps is a special version of Winsteps®, full-size but with a number of the pull down menu options grayed out. Mike has about one gazillion output options in Winsteps®. My experience in running Rasch workshops is that booting Winsteps® is like opening Aladdin's Cave: there are too many attractive features for beginners. They can easily get lost in the huge range of options. Our approach is to restrict the options to those we think best suit novices to Rasch measurement. The pdfs for the tutorials and the data files for the book are all pre-loaded into the software. A CD containing Bond&FoxSteps comes free with the 2nd edition and the book will be supported at www.bondandfox.com. But the credit goes to Mike Linacre. I shared the idea with him at one point; he asked me to specify exactly what I wanted. After a few iterations, Mike produced a learning tool that far exceeded my wildest expectations.

[ p. 24 ]

Q: The Rasch family of models includes the dichotomous model, the partial credit model, the rating scale model, and the many-facets model. Will there be any expansion of these models in the future?

A: In fact, the Rasch model for Poisson counts was elaborated by Wright and Stone in 1979. The models you mentioned – the ones we focus on in our book – are only a part of the family, even if one adopts the strictest interpretation of the Rasch requirements. The literature is already full of Rasch-like and Rasch-based models. Look at those implemented in ConQuest for the PISA analyses, and the edited volume on multivariate and mixture Rasch models by von Davier and Carstensen that has just come out. Future developments in Rasch measurement will make it unnecessary for ordinary folks to be able to run and interpret Winsteps® and the like – they will be plug and play. The developments will be user-friendly and undemanding (on users, not on measurement) in ways we can't imagine. In short, that is just the way that other computer software has evolved.

Works Cited

Bond, T. G. & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd Ed.). Mahwah, NJ: Lawrence Erlbaum.

Linacre, J. M. (2006, September). Winsteps (Version 3.61.2) [Computer Software]. Chicago: Winsteps.com.

Linacre, J. M. (2007). Bond&FoxSteps (Version 1.0) [Computer Software]. Chicago: Winsteps.com.

McNamara, T. F. (1996). Measuring second language performance. New York: Longman.

OECD Programme for International Student Assessment. (2000). PISA: Program for International Student Assessment. Paris, France: Author. Also retrieved from the World Wide Web at http://PISA. http://www.PISA.oecd.org/.

von Davier, M. & Carstensen, C. H. (Eds.) (2006). Multivariate and mixture distribution Rasch models: Extensions and applications (Statistics for Social Science and Behavorial Sciences). New York: Springer.

Wright, B. D., & Stone, M. H. (1979). Best Test Design. Chicago, IL: MESA Press.

Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ConQuest: Generalised item response modelling software [Computer software]. Camberwell, Victoria: Australian Council for Educational Research.

Newsletter: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
last Main Page next
http://jalt.org/test/bon_sch.htm

[ p. 25 ]