Article appearing in Shiken 22.2 (December 2018) pp. 14-19.
Author: James Dean Brown
University of Hawai'i at Manoa
Question:
For many tests like multiple-choice, true-false, and fill-in, we have item statistics which we can use in calculating reliability statistics like K-R20 and alpha. But for dictations, we only count-up total scores. So, my question is this: (a) can we use K-R21 based on the mean, standard deviation, and number of items for the total scores to calculate the reliability of a dictation, and (b) if so, how long should a dictation be in order to be reliable?
Answer:
This is the first of two columns that I will use to answer your questions. In the next column, I will discuss the relationship between dictation length and reliability. In this one, I will explore some problems and solutions for calculating the reliability of dictations. To do so, I will address four central questions:
- What data serve as the basis for the current column?
- What are some options for calculating reliability for dictations and what are the relationships among them?
- What else is important in interpreting these common reliability estimates?
- What does all this mean for calculating the reliability of dictation scores?