Application of the fusion model to while-listening performance tests Vahid Aryadoust (National Institute of Education, Singapore) |
"WLP tests represent the listening comprehension construct narrowly because they merely focus on pre-comprehension skills alongside the comprehension of details" |
Given that WLP test takers' simultaneous exposure to oral and written inputs precludes note taking, it is likely that test takers who fall behind the stream of written/oral input miss some items not necessarily because of limited listening skills, but because of limited reading skills, memory span (Hildyard & Olson, 1978), test-taking strategies (Bachman, 1990), or test wiseness (Bachman, 1990; Kunnan, 1995), or because of other constraining influences (Field, 2009). (Aryadoust, in press, p. ##)
[ p. 37 ]
Due to the complexity of comprehension mechanisms in WLP tests, there are regrettably few studies investigating their structure. The present investigation seeks to describe the structure of the final section of the IELTS listening test (lecture comprehension) and provides a new window on some attributes affecting WLP lecture comprehension test performance. To serve this goal, I draw on empirical research into listening tests (e.g., Buck & Tatsuoka, 1998; Freedle & Kostin, 1999) as well as anecdotal or speculative taxonomies of listening comprehension sub-skills2 (e.g., Richards, 1983) and propose a provisional attribute profile for the lecture comprehension section of the IELTS listening test. The profile is then subjected to fusion modeling.[ p. 3 ]
Coding test items. I undertook a qualitative investigation of the test items, exploring the test items' structure and text content. For each test item, I noted a range of attributes including (a) the sub-skills tapped by the item, (b) task-related factors affecting participants' performance, and (c) text-related factors. The analysis was carried out twice with a one-week interval between to ascertain the intra-coder reliability. It is acknowledged that using two or more raters would offer greater reliability. However, given that experts familiar with the structure of the WLP tests were not available at the time of the study, it was decided to perform the coding twice with the same rater (the researcher) and control for the intra-reliability of the coding.[ p. 4 ]
Table 1Attribute | Definition | Items associated with the attribute |
1. Paraphrase | Listeners must keep the input in mind, read the test item and keep it in mind, and make a mental paraphrase of the aural message to match it with the written test item. For example, the text on the speed of a type of a bird says "there is still some dispute about just how fast they can actually fly". Item 2 reads: "There is disagreement about their maximum ______." The candidate must write (flight/flying) speed, synonymous to "how fast they can actually fly". | 1, 2, 6, 7 |
2. Details | The ability of the listener to understand details such as names, specific pieces of information, and dates is tapped. | 3, 4, 5, 8, 9, 10 |
3. Similar but misleading pieces of information | While listener is waiting for the right piece of information to arrive a few pieces of information that could fit the answer precede it, possibly confusing listeners. For example, the answer to Item 1 is Australia, which is a place name; the listener is awaiting a place name to pop up. But a few place names are heard before the answer, such as South Pole and the state of Tasmania. Given that test takers must make a spontaneous paraphrase of the aural stimuli to match it with the item and that they must keep a mental track of the place names that they hear, they may become distracted and miss the item. | 1, 10 |
4. Paraphrasing the stem (synonyms) | To answer some items, candidates must understand synonyms. For example, the text related to Item 2 uses the word dispute, yet the item stem contains the word disagreement. | 2 |
5. Accurate grammatical forms | Some test items require that the test taker recognize the exact grammatical points. For example, Item 3 requires a present participle: "…the male spends some of his time___________." The answer is looking or searching for food. If -ing is dropped, the test taker might be penalized. | 3 |
6. Low information density | That is, there is a relatively large amount of information not relevant to the answer in the text before arriving at the point where the answer lies. Answers do not appear rapidly in this sort of text. | 1, 2, 3, 8 |
7. High information density | The answers to items 4 through 7 are condensed in one paragraph. High information density forces candidates to supply the answers more rapidly than items with less information density. | 4, 5,6, 7, 9, 10 |
8. Repetition or paraphrase of the answer in the text | It seems that when information density is high, the answer to some - but not all - of the items is repeated or paraphrased in the text. | 5, 9 |
[ p. 5 ]
Table 2Item | π*i | r*1 | r*2 | r*3 | r*4 | r*5 | r*6 | r*7 | r*8 | ci |
1 | 0.69 | 0.68 | 0.60 | 0 | 0 | 0 | 0.75 | 0 | 0 | 2.09 |
2 | 0.69 | 0.70 | 0 | 0 | 0.81 | 0 | 0.68 | 0 | 0 | 2.34 |
3 | 0.79 | 0 | 0.56 | 0 | 0 | 0.90 | 0.79 | 0 | 0 | 2.69 |
4 | 0.89 | 0 | 0.82 | 0 | 0 | 0 | 0 | 0.49 | 0 | 2.70 |
5 | 0.87 | 0 | 0.74 | 0 | 0 | 0 | 0 | 0.53 | 0.88 | 2.56 |
6 | 0.56 | 0.92 | 0 | 0 | 0 | 0 | 0 | 0.37 | 0 | 2.57 |
7 | 0.84 | 0.46 | 0 | 0 | 0 | 0 | 0 | 0.74 | 0 | 1.70 |
8 | 0.99 | 0 | 0.70 | 0 | 0 | 0 | 0.95 | 0 | 0 | 2.51 |
9 | 0.98 | 0 | 0.84 | 0 | 0 | 0 | 0 | 0.96 | 0.46 | 2.62 |
10 | 0.94 | 0 | 0.67 | 0.96 | 0 | 0 | 0 | 0.95 | 0 | 2.36 |
Student | Att. 1 | Att. 2 | Att. 3 | Att. 4 | Att. 5 | Att. 6 | Att. 7 | Att. 8 | Score | Modeled score |
1 | 0.678 | 0.740 | 0.520 | 0.672 | 0.66 | 0.492 | 0.526 | 0.832 | 5 | 7.47 |
2 | 0.450 | 0.736 | 0.616 | 0.808 | 0.748 | 0.758 | 0.836 | 0.704 | 6 | 7.14 |
3 | 0.392 | 0.096 | 0.330 | 0.490 | 0.140 | 0.414 | 0.418 | 0.110 | 2 | 2.96 |
4 | 0.244 | 0.872 | 0.304 | 0.514 | 0.696 | 0.704 | 0.290 | 0.156 | 5 | 4.97 |
5 | 0.708 | 0.994 | 0.604 | 0.892 | 0.91 | 0.842 | 0.956 | 0.942 | 7 | 8.14 |
[ p. 6 ]
To evaluate the fit of the model, I calculated the correlation between the estimated and modeled item p-values (item difficulty), which was 0.996 (p < 0.001). The significantly high correlation coefficient supports the fit of the model to the data. The computer program further gives a global measure of item fit which is the average difference between the observed and modeled p-values. In the present study, this index is 0.434 (0.826 - 0.392), which is a tolerable discrepancy. Roussos et al. (2005) argued that because the prime goal of the FM is to estimate the attribute mastery profiles of test takers and students, a slight discrepancy would not have a substantial influence over the results."The [fusion model] aids in extracting influential attributes taxing cognitive processes, though the decision on whether or not they are construct-irrelevant factors with high cognitive demands is left to the researcher. We can confidently argue that paraphrasing the stem (synonyms) and accurate grammatical forms are construct-irrelevant factors." |
[ p. 7 ]
[ p. 8 ]
Age of falcons | What occurs |
[Items 4 through 6] | [Items 4 through 6] |
1-12 months | More than half of the falcons 7 ________________. [answer = die] |
Procedures used for field research on peregrine falcon chicks | |
First: | Catch chicks |
Second: | 8 ________________ to legs |
[ p. 9 ]