Reading complexity judgments: Episode 2

Shiken: JALT Testing & Evaluation SIG Newsletter
Vol. 6 No. 1 Feb. 2002 (p. 7 - 13) [ISSN 1881-5537]

Reading complexity judgments: Episode 2

Gholam Reza Haji Pour Nezhad
Tehran University

The first part of this article, online at www.jalt.org/test/haj_1.htm, introduced several factors
thought to influence reading comprehension. This section shows how a test investigating
judgments about reading complexity was developed, validated, and utilized.

Method

A convenience sample of 99 senior English majors at three universities in Tehran participated in this study. Fifty of the respondents were males and 49 were females.

In Phase 1 of this study, respondents had 120 minutes to rate 55 statements according to six different scales. These statements, which are online at www.jalt.org/test/haj_p1_0.htm, were rated according to the following criteria: general complexity, imagery, lexical difficulty, world knowledge demand, semantic complexity, and syntactic complexity. All statements were seventeen words in length.

Five types of statements were examined in Phase 1:

Non-complex statements - with a low lexical density, high degree of imagery, single clause T-unit constructions, and no topicalization or pseudo-cleft.
Lexically dense statements - which resembled the previous statements, except for a high lexical density.
Syntactically complex statements - which were like non-complex statements, but with topicalization and pseudo-cleft.
T-unit complex statements - which were similar to non-complex statements, except they contained two subordinate and one main clause.
Abstract statements - with the characteristics of the non-complex sentences, but less imagery.

If respondents felt unsure how to rate a statement, they were instructed to leave it blank.

In Phase 2 of this study, administered one week later, respondents had 100 minutes to rate the 55 restatements online at www.jalt.org/test/haj_p2_0.htm according to the same scales used in Phase 1. Again, if respondents felt unsure how to rate an item, they were instructed to leave it blank.

All restatements were eleven words in length and organized into six categories. In addition to the five categories explored in Phase 1, doubly complex restatements were examined. Doubly complex restatements combined two forms of complexity, such as abstractness and T-unit complexity. Hence the statements in Phase 1 differed from the restatements in Phase 2 in two ways: (1) the restatements were slightly shorter, and (2) some restatements combined two forms of complexity.

In Phase 3 of this study, administered a week after the second phase, respondents were given as much time as they needed to assess the 55 statements of Phase 1 along with the 55 restatements used in Phase 2 according to seven scales. In addition to the six scales used in the first two phases of this study, respondents evaluated statements/restatements according to a five point scale of factuality/inferentiality, then decided whether the restatements seemed true.

All statements and restatements were paired into a "stem-response" format. In a stem-response format, the second statement depends on the statement preceding it. A sample stem-response pair for one item from Phase 3 of this study appears below -

[ p. 7 ]

Stem: The old Japanese driver finally parked his small car between the toy shop and the grocery store.

Response: The driver took some time to park his car over there.

Degree of Factuality / Inferentiality:

  
	     1          2         3         4             5
	  completely  mostly    evenly     mostly     completely
	   factual    factual   mixed  inferential  inferential

Response Veracity:

  
   				TRUE / FALSE

(Instruction: If the response does not appear to be necessarily true, please select false.)

If respondents were unsure how to rate a stem-response pair, they were instructed to leave it blank. The full survey is online at www.jalt.org/test/haj_p3_0.htm.

Results and Discussion

This study explores five research questions, three of which will be considered in this segment:

Are respondents able to distinguish between different kinds of complexity?

Do respondents assign significantly differentiated item complexity orders on the basis of the statement/restatement types?

How do factuality/inferentiality ratings by respondents pertain to item complexity ratings?

The raw data used in this survey is online at www.jalt.org/test/haj_a2.htm. Let us now begin our investigation.

Question 1: Are respondents able to distinguish between different kinds of complexity?

Except for two data types, the answer to this question is "yes". Although the respondents did not distinguish (1) lexically complex sentences from T-unit complex sentences, or (2) syntactically complex sentences from T-unit complex sentences in any statistically significant way, they did distinguish these types of data at a P <.05 level of significance:

[ p. 8 ]

non-complex
non-complex
non-complex
non-complex
lexically complex
lexically complex
abstract complex
abstract complex

from
from
from
from
from
from
from
from

lexically complex
abstract complex
syntactically complex
T-unit complex
abstract complex
syntactically complex
syntactically complex
T-unit complex

Let's consider how this point is illustrated in Phase 1 of this study. Cross-tabulations revealed that subjects rated abstract complex statements as being most difficult and non-complex statements least difficult. A summary of the different types of complexity ratings for Phase 1 of this study appears below:

1. Abstract complex
2. Syntactically complex
3. T-unit complex
4. Lexically complex
5. Non-complex

Mean
3.11
2.46
2.33
2.25
2.07

S.D.
1.22
1.20
1.16
1.17
1.10

Table 1. Mean general complexity ratings for five statement types found at www.jalt.org/test/haj_p1_0.htm
according to a 5-point scale in which "5" denotes very complex and "1" very simple.

To further analyze this question, an ANOVA was performed on the complexity ratings of the 55 restatements in Phase 2. The respondents distinguished the same sentence types in this phase as in the previous phase. In addition, they also distinguished these sentence types a P < .05 level of significance:

doubly complex
doubly complex
doubly complex
doubly complex

from
from
from
from

non-complex
abstract complex
syntactically complex
from T-unit complex

The only sentence types they were not able to distinguish were, as in Phase 1, (1) T-unit complex ones from lexically complex ones, and (2) abstract complex ones from syntactically complex ones.

How did the respondents rate the six restatement types in terms of complexity? The complexity ratings are summarized in Table 2 below -

1. Doubly complex
2. Abstract complex
3. Syntactically complex
4. T-unit complex
5. Lexically complex
6. Non-complex

Mean
2.76
2.45
2.43
2.22
2.21
1.62

S.D.
1.23
1.17
1.20
1.12
1.19
.94

Table 2. Mean general complexity ratings for six restatement types found at www.jalt.org/test/haj_p2_0.htm
according to a 5-point scale in which "5" denotes very complex and "1" very simple.

[ p. 9 ]

To clarify how these two data sources differ, note how the mean values for each sentence type explored in the first two phases of this study differ -

Figure 1. Mean general complexity ratings for five statement types
(solid line) explored in Phase 1 and six restatement types (dotted line)
explored in Phase 2 of this study according to a 5-point scale.

Table 1 and 2 suggest that respondents rated sentences in terms of the following descending hierarchy of complexity:

1. Doubly complex
2. Abstract complex
3. Syntactically complex
4. T-unit complex
5. Lexically complex
6. Non-complex

Question 2: Do respondents assign significantly differentiated item complexity orders
on the basis of the statement/restatement types?

Generally, the hierarchy of complexity suggested in Question 1 was also found in the item complexity ratings in Phase 3. Whereas Question 1 relied on data from Phase 1 and 2 of this study, this question employed data from Phase 3. The restatement types found in the "response" of each of the paired sentences at www.jalt.org/test/haj3_0.htm were considered as independent variables in a one-way ANOVA. In this ANOVA, the dependent variable was the general complexity ratings for Phase 3, which are online at www.jalt.org/test/haj3_1.htm. The main hunch pushing me toward this question was that respondents might rate stem-response complexity by paying predominant attention to the stem rather than both sentences in the whole item. Or, the other way round, they might pay special attention to the response in deciding how complex an item was. Table 3 summarizes the ANOVA ratings:

Table 3

Table 3. Mean differences for the complexity ratings for items in Phrase 3 of this study on the basis of the five statement types.

[ p. 10 ]

To make sure that Table 3 is clear, let us point out that "mean difference" refers to the difference between the mean of one variable from that of another. For example, the mean difference between non-complex statements and abstract complex ones was -.84. This means that most non-complex statements were .84 points less complex than abstract complex statements. Moreover, the * mark denotes statistically significant differences at a P< 0.05 level, meaning that the result cannot be ascribed to random accident in 95% of the cases.

The mean differences for the various item types in this study can also be expressed in Fig. 2:

Figure 2. Means plot of item complexity ratings on the basis of five item types.

Figure 2 suggests that there is a meaningful parallel between item complexity judgments and statement complexity judgments if statement types are considered as independent variables. The hierarchy of complexity suggested in Question 1 also holds here.

As observed, some pairs have significant differences while others do not. To a large extent, this is because deciding item complexity may not, in reality, take place merely on the basis of the statement type. Herein, we have to consider the fact that subjects might also attend to each sentence-level statement. The interaction between the statements and restatements also need to be considered.

To further explore Question 2, a ANOVA was performed with the restatement types from Phase 2 as independent variables and general complexity ratings for items in Phase 3 as the dependent variable. The results appear in Table 4.

Table 4

Table 4. Mean differences of item complexity ratings on the basis of the six restatement types in Phase 2 of this study.

[ p. 11 ]

An interesting point is the similarity between the pattern observed in this analysis and the one for restatement complexity considering restatement types, which is depicted in Figure 3.

Figure 3. Means plot of item complexity ratings on the basis of the 6 restatement types from Phase 2.

Although there are minor changes in this pattern, there is a meaningful parallel between item complexity judgments and restatement complexity judgments if we consider restatement types as independent variables. Doing this, the following hierarchy of complexity is manifest:

1. Doubly complex
2. Abstract complex
3. T-unit complex
4. Lexically complex
5. Syntactically complex
6. Non-complex

The only difference between the hierarchy above and the one suggested earlier is a decrease of syntactically complex ratings.

Question 2 corroborates the findings of Question 1, which suggest that respondents consider a hierarchy of complexity. This hierarchy is manifested even when we have a combination of various statement and restatement types. One possible justification for the findings of Question 2 is that respondents decided on item general complexity mainly on the basis of a simple addition of the features of statements and restatements rather than paying attention to the pragmatic interaction of both parts of each item.

Question 3: How do factuality/inferentiality ratings by respondents pertain to item complexity ratings?

The data from Phase 3 suggests that a parallel exists between how complex respondents perceive an item to be and how factual or inferential they perceive it to be. Respondents generally considered factual items to be simpler than inferential ones. Let us summarize the complexity ratings for the 55 "stem-response" pairs examined in the final phase of this study:

1. Completely factual
2. Mostly factual
3. Equally mixed
4. Mostly inferential
5. Completely inferential
6. No response

Mean
1.76
2.16
2.12
2.27
2.18
***

S.D.
.92
.95
1.01
.99
1.17
****

% of Total Items
26.9% (N = 1467)
13.4% (N = 730)
9.6% (N = 526)
9.4% (N = 516)
17.4% (N = 951)
23.0% (N = 1255)

Table 5. Mean complexity ratings for paired "stem-response" statements/restatements
found at www.jalt.org/test/haj_p3_0.htm by 99 respondents in this study.

[ p. 12 ]

As you can see, respondents tended to consider the following pairs significantly different in terms of general complexity -

completely factual items
completely factual items
completely factual items
completely factual items

from
from
from
from

mostly factual items
equally mixed items
mostly inferential items
completely inferential items

Completely factual items were rated as the easiest item types. Looking at the data on Table 3, what is especially significant is that completely factual items were rated much easier than items which were either mostly or entirely inferential. This suggests that respondents recognized a parallel between perceived complexity and perceived factuality/inferentiality.

Final Episode

In the next part of this article we will explore the way stem-response combinations appear to influence complexity order rankings, and how complexity ratings by students differ from those of teachers. Some of the practical implications of this study will also be underscored.

THIS ARTICLE
Abstract	Background	Method	Results
Conclusion	References	Appendix 1	Appendix 2

< www.jalt.org/test/haj_1.htm www.jalt.org/test/haj_2.htm www.jalt.org/test/haj_3.htm >

[ p. 13 ]

Newsletter: Topic Index

Author Index

Title Index

Date Index
TEVAL SIG: Main Page

Background

Links

Network

Join

Shiken: JALT Testing & Evaluation SIG Newsletter Vol. 6 No. 1 Feb. 2002 (p. 7 - 13) [ISSN 1881-5537]