Article appearing in Shiken 16.2 (Nov 2012) pp. 8-14.
Authors: Jeffrey Stewart1, Aaron Gibson2 & Luke Fryer3
1. Kyushu Sangyo University, Cardiff University
2. Kyushu Sangyo University
3. Kyushu Sangyo University
Abstract:
Unlike classical test theory (CTT), where estimates of reliability are assumed to apply to all mem- bers of a population, item response theory provides a theoretical framework under which reliability can vary by test score. However, different IRT models can result in very different interpretations of reliability, as models that account for item quality (slopes) and probability of a correct guess significantly alter estimates. This is illustrated by fitting a TOEIC Bridge practice test to 1 (Rasch) and 3-parameter logistic models and comparing results. Under the Bayesian Information Criterion (BIC) the 3-parameter model provided superior fit. The implications of this are discussed.