Equating classroom pre and post tests under item response theoryJeffrey Stewart and Aaron Gibson (Kyushu Sangyo University) |
Abstract |
The authors illustrate how classroom pre-tests can be used to gather information for an item bank from which to construct summative post-tests of appropriate levels and measurement properties, and detail methods for equating pre and post-test forms under item response theory in such a manner that resulting ability estimates between conditions are comparable. Keywords: Item Response Theory, test equating, classroom assessment |
[ p. 12 ]
Step 1: Create a variable map[ p. 13 ]
A spread of difficulty is important when choosing common items between forms, but it is difficult to ensure this if item parameters have not been previously estimated. One way to compensate for this weakness of design is by test spiraling. Rather than giving different forms to different classes, shuffle tests so that roughly equal proportions of students take each form in each class. In this manner, a student may write a somewhat different test from his or her neighbors. Randomly distributing forms among the student population can help ensure that the groups that write each form are of comparable ability.Logit Difference | Probability of Success | Logit Difference | Probability of Success |
5.0 | 99% | -5.0 | 1% |
4.6 | 99% | -4.6 | 1% |
4.0 | 98% | -4.0 | 2% |
3.0 | 95% | -3.0 | 5% |
2.2 | 90% | -2.2 | 10% |
2.0 | 88% | -2.0 | 12% |
1.4 | 80% | -1.4 | 20% |
1.1 | 75% | -1.1 | 25% |
1.0 | 73% | -1.0 | 27% |
0.8 | 70% | -0.8 | 30% |
0.5 | 62% | -0.5 | 38% |
0.4 | 60% | -0.4 | 40% |
0.2 | 55% | -0.2 | 45% |
0.1 | 52% | -0.1 | 48% |
0.0 | 50% | -0.0 | 50% |
[ p. 14 ]
increases reliability (Linacre, 2010b), and this design will provide maximum test information for departure from pre-test benchmarks. A pitfall of this approach is that the test will produce less reliable ability estimates for learners who far exceed original levels. Once expectations for development have been established, tests can be made that provide maximum information at the level of ability that students are expected to arrive at after instruction.
Person ability = Mean item difficulty + sqrt ( 1 + S.D. of item difficulty2 / 2.89)
*Log_e(right answer count / wrong answer count)
= Average Difficulty+SQRT(1+(S.D of Difficulty^2)/2.89)*LN(Score/(k-Score))
[ p. 15 ]
Where "Score" indicates the cell containing a student's raw score, and k is the total number of items on the test.[ p. 16 ]
Acknowledgement The authors extend special thanks to Dr. John Michael Linacre |
[ p. 17 ]
[ p. 18 ]