JALT Testing & Evaluation SIG Newsletter
Vol. 2. No. 1. Oct. 1998. (p. 6 - 10)
Do Different C-tests Discriminate Proficiency Levels of EL2 learners? (cont'd.)
An analysis of means for all tests for the non-returnees indicates the highest
means obtained by the STEP-Eiken, and the lowest mean scores by the C-test 2,
indicating the former to be the easiest, and the Narration C-test to
be the most difficult. An ANOVA was conducted to find the statistical
significance in these scores, and the obtained results were: F = 176.18 (2, 179), p <.00.
[
p. 6
]
A similar analysis was conducted on the mean scores obtained by the
returnees for the two types of C-tests and the STEP-Eiken. The results
indicated the highest mean scores for the latter and the lowest means
for the first type of C-test. This shows a similar pattern as that
observed for the non-returnees. These score differences were
checked by an ANOVA and the results were found to be highly
significant : F = 56.94 (2,75), p <.00 as summarized in
Table 3.
Table 3. Results of an ANOVA analysis for the scores of all subjects on all tests.
__________________________________________________________________________
Group Source of variance SS df MS F
__________________________________________________________________________
Non-returnees Between groups 3581.4 2 1 790 233.3
Within group 6677.13 57 76.75
Total 10258.53
Returnees Between groups 3416.5 2 17082.3 88.303
Within group 1458.8 27 193.5
Total 4875.3
_________________________________________________________________________
p = <.001
A cursory glance at these tables shows that the returnees group
obtained a consistently higher set of mean scores for both the
C-test using different short segments from different texts and the
C-test using only one narrative passage. These differences show that
the C-test types were much easier for the returnees than for the other
group. To further determine the extent to which C-tests of different
types can discriminate English proficiency levels among the students,
t-tests were conducted between the scores of each group for each.
The results of t-test analyses indicate that C-test 2 using different
short texts was easier for the returnees than for the other group at a significant level: t=.86 df= 29, p=.00. In the same manner, the
narrative C-test proved to be much easier for the returnees
than for the non-returnees, and the difference level was found to be
highly significant: t = 3.21 df=59, p =.005. The returnees outperformed
the non-returnees in both C-test 1 and C-test 2. These results indicate the two C-test types used in this study can
discriminated levels of English proficiency of the Japanese university students who took part in this study.
In addition, there is also the question of which of these two C-test
types is superior to the other in terms of reliability, and in terms of
concurrent validity. To permit comparison among the reliability
estimates of the different tests used in this study, 'corrected' reliabilities', the reliabilities that
would be observed if all the test types had contained 100 items, were
applied to all the cloze tests and STEP-Eiken test items (Gordon, 1989 and
Chapelle, 1990). Higher reliability results were observed for the
C-test using several segments than the narrative type for both sample
groups.
[
p. 7
]
Criterion related validity
To determine how well C-tests relate to an outside criterion, both C-test scores for both groups
were correlated with their scores STEP-Eiken scores.
Moreover, since these correlations are based on tests with different number of items, correlations were adjusted
corrected for attenuation (Jafarpur, 1995) as shown in Table 4.
Table 4. Correlations among the C-test types and STEP-Eiken scores.
_________________________________________________________________________
Group C-test1 (different texts) C-test 2 (narrative)
and STEP and STEP
Returnees .58 .29
Non-returnees .51 .26
_________________________________________________________________________
The table shows only moderate correlations, of at least .50,
(Klein-Braley, 1984) between C-test 1 and STEP-Eiken test scores. The C-test
2 did not correlate much with the STEP-Eiken scores. The C-test that was
based on short texts was superior to the one based on a long narrative, counter-indicating
Mochizuki's (1994) claim that single narratives make the best C-tests.
|
"The C-test that was based on short texts was superior to the one based on a long narrative, counter-indicating
Mochizuki's (1994) claim that single narratives make the best C-tests."
|
|
More importantly, the moderate correlations between the C-test from various
texts against a single criterion suggests that it is possible for
C-tests to tap different language abilities of ESL learners (Jafarpur,
1995). Finally, texts carefully chosen according to their similarities
in terms of interest and readability level lead to the superiority of a
C-test constructed using several short passages over a C-test using
only one text.
[
p. 8
]
Summary and conclusion
Three points can be made from this research:
(1) The C-test procedure does discriminate moderately between the levels of English
proficiency for the Japanese university students in this sample.
(2) The C-test using several short segments from different texts appears to be
superior to the one using only one long narrative text.
(3) The two C-tests differ in terms of their criterion-related validity.
The writer acknowledges the fact that the number of samples
and tests included in the study was small. It appears quite possible
that random variation alone could account for the variability in the results of
statistical analysis. Notwithstanding, the results of this
investigation suggest that C-tests have the ability to differentiate
ESL levels the Japanese university students in this sample.
Furthermore, the C-test constructed from different
passages has been shown to have more validity against a
reference criterion than a narrative type C-test. Because of the far-reaching potential of C-tests in the field
of empirical research and classroom testing, further research on their application and effectiveness is warranted.
[
p. 8
]
References
Bormuth, J. R. (1967). Comparable cloze and multiple-choice comprehension test scores. Journal of Reading 10, 291-299.
Brown, J. D. (1983). A closer look at cloze: validity and reliability. In
Oller, J. W. Jr. (Ed.) Issues in Language Testing. Rowley, MA: Newbury House, 237-250.
Brown, J. D. (1988). Tailored cloze: improved with classical item analysis and techniques. Language Testing, 5 (1) 19-31.
Brown, J. D. (1993). What are the characteristics of natural cloze tests? Language Testing, 10 (2) 93-116.
Carroll, J.B. (1987). Review of Klein-Braley and Raatz. C-tests in der praxis. Language Testing, 4, 99-106.
Chapelle, A. and Abraham, R. (1990). Cloze Method: what difference does it make? Language Testing, 7 (2) 121-146.
Chapelle, C. (1994). Are C-tests valid measures for L2 vocabulary research? Second Language Research, 10 (2) 157-187.
Cohen, A.D., Segal, M, and Weiss, R. (1984). The C-tests in Hebrew. Language Testing, 1 (2) 221-225.
Darnell, D.K. (1970). Clozentropy: A procedure for testing English language proficiency of foreign students. Speech monographs. 37, 36-46.
Dornjei, Z. and Katona, L. (1992). Validation of C-tests among Hungarian EFL learners, Language Testing, 2 (1) 187-206.
Harris, D. & Palmer, L. (n.d.) A Comprehensive English language test for learners of English (CELT). New York: Mc Graw Hill.
Henning, J. (1987). A guide to language testing: Development, evaluation, measurement. Cambridge, MA: Newbury House.
Ikeguchi, C. (Unpublished ms.) The four cloze types: To each its own. Tsukuba Women's, University, Japan
Jafarpur, A. (1995). Is C testing superior to Cloze? Language Testing, 12 (2) 194-215.
[
p. 9
]
Jonz, J. (1990). Another turn in the conversation: what does the cloze measure? TESOL Quarterly, 24 (1) 61-63.
Kimura, K. & Visgatis, B. (1996). High school English textbooks and college entrance examinations: A comparison of reading passage difficulty. JALT Journal, 18
(1) 81-95.
Kimura, Y. (1995). Investigating the English competence of students returned from overseas. in K. Kitao, et al. Culture and Communication. Kyoto: Yamaguchi Shoten.
Klare, G.R. (1984). Readability. In P. D. Pearson (Ed.), Handbook of Reading Research (pp. 681-738). New York: Longman.
Klein-Braley, C. (1985). A close-up on the C test: A study in the construct validation of authentic tests. Language Testing, 2 (1) 76-104.
Klein-Braley, C. & Raatz, E. (1984). A survey on the C test-1. Language Testing, 1 (2) 134-146.
McBeath, N. (1990). C-tests: Some words of caution. English Teaching Forum, 28, 45-46.
Mochizuki, A. (1994). C-tests: Four kinds of texts, their reliability and validity. JALT Journal, 16 (1) 41-54.
Negishi, M. (1987). The C-test: An integrative measure? IRLT Bulletin 1, 3-26.
Oller, J. W. Jr. (1972). Scoring methods and difficulty levels for cloze
tests of proficiency in English as a second language. Modern Language Journal 56, 151-158.
Oller, J. W. Jr. (1983). Issues in Language Testing. Rowley, MA: Newbury House.
Raatz, U. (1985). Better theory for better tests? Language Testing, 2 (1) 60-75.
Raatz, U. & Klein-Braley, C. (1981). The C-test: A modification of the cloze procedure. In T. Culhane, C. Klein-Braley, & D.K. Stevenson,
(Eds.), Practice and problems in language testing. University of Essex. Paper 26. Colchester: University of Essex.
Taylor, W.L. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30 414-438.
Tschirner. E. (1996). Rethinking beginning FL instruction. Modern Language Journal. 80, 1-13.
- Return to Part 1 of this article -
A copy of the tests used in this study can be obtained from the author.