Statistics Corner Questions and answers about language testing statistics: |
|
How can we calculate item statistics for weighted items? |
James Dean Brown University of Hawai'i at Manoa |
Student Item I Item 2 Item 3 Item 4 Item 5 Total Scores Kimi 1 3 5 1 3.0 100 Sachiko 1 3 5 1 2.8 89 Keiko 1 2 4 ½ 2.1 85 Rieko 1 2 4 ½ 1.7 80 Mitsue 1 3 3 ½ 1.5 79 Hitoshi 1 2 3 ½ 1.0 70 Hide 0 1 2 ½ 0.9 64 Yoshi 0 1 2 0 0.7 50 Toshi 0 0 1 0 0.5 37 Hachiko 0 0 0 0 0.3 13 |
Student Item 1 Item 2 Item 3 Item 4 Item 5 Total Scores Kimi 1.00 1.00 1.00 1.00 1.00 100 Sachiko 1.00 1.00 1.00 1.00 0.93 89 Keiko 1.00 0.67 O.80 0.50 0.70 85 Rieko 1.00 0.67 0.80 0.50 0.57 80 Mitsue 1.00 1.00 0.00 0.50 0.50 79 Hitoshi 1.00 0.67 0.00 0.50 0.33 70 Hide 0.00 0.33 0.40 0.50 0.30 64 Yoshi 0.00 0.33 0.40 0.00 0.23 50 Toshi 0.00 0.00 0.20 0.00 0.17 37 Hachiko 0.00 0.00 0.00 0.00 0.10 13 IF 0.60 0.57 0.58 0.45 0.4 ID 1.00 0.78 0.73 0.83 0.71 |
IF = (1.00 + 1.00 + .67 +.67 + 1.00 + .67 + .33 + .33 + .00 +.00) / 10 = .57Then, the item discrimination statistic (ID) could be based on the average proportion score for the upper group minus the average for the lower group. Where the upper group is defined as the top three students in Table 2 and the lower group is defined as the lower three students. ID for Item 2 would be calculated as follows:
IFupper - IFlower = (1.00 + 1.00 + .67) / 3 – (.33 + .00 + .00) / 3 = 2.67 / 3 - .33 / 3 = .89 – .11 = .78Exactly the same principles could be applied to calculating the difference index (DI) and the B-index. Note that the results for IF, ID, DI, and B would be interpreted very much in the same way they are normally interpreted.
References
Brown, J. D. (1996). Testing in language programs. Upper Saddle River. NJ: Prentice Hall Regents.
Brown. J. D. (translated into Japanese by M. Wada). (1999). Gengo tesuto no kisochishiki.
[Basic knowledge of language testing]. Tokyo: Taishukan Shoten.
1 For instance, item facility could be calculated as a simple average of the weighted scores shown in Table 1. In such a case, the values would simply be reported and interpreted relative to the possible values in the scale. For instance, the average for Item 2 in Table 1 would be 17/10 = 1.7, which could then be compared to the total possible for that item of 3 to determine whether or not it was difficult. However, using this method, the interpretation would be different for each type of item weighting, which could prove confusing. An alternative strategy for calculating item discrimination for weighted items would be to use computer power to calculate whatever correlation coefficient would be appropriate for the weighted item scores and the total scores on the test. |