Calculating reliability of dictation tests: Does K-R21 work?

Article appearing in Shiken 22.2 (December 2018) pp. 14-19.

Author: James Dean Brown
University of Hawai'i at Manoa

Question:
For many tests like multiple-choice, true-false, and fill-in, we have item statistics which we can use in calculating reliability statistics like K-R20 and alpha. But for dictations, we only count-up total scores. So, my question is this: (a) can we use K-R21 based on the mean, standard deviation, and number of items for the total scores to calculate the reliability of a dictation, and (b) if so, how long should a dictation be in order to be reliable?

Answer:
This is the first of two columns that I will use to answer your questions. In the next column, I will discuss the relationship between dictation length and reliability. In this one, I will explore some problems and solutions for calculating the reliability of dictations. To do so, I will address four central questions:

What data serve as the basis for the current column?
What are some options for calculating reliability for dictations and what are the relationships among them?
What else is important in interpreting these common reliability estimates?
What does all this mean for calculating the reliability of dictation scores?

Download full article (PDF)