Student Evaluation of Teachers: Professional Practice or Punitive Policy? (continued)
	 A self-fulfilling prophecy
 A concern to address in the use of student evaluations is the impact 
the act of evaluation has on the students' perceptions of the teachers 
and on the teachers themselves. 
 There are biases in evaluating a person's personality, performance 
and competence – biases that can lead to flawed information gathering 
strategies that are self fulfilling (Harris, 1994). A 
self-fulfilling prophecy as defined by Merton (1948) 
basically means that an incorrect perception, belief or definition of a 
set of circumstances can evoke behaviour that makes the incorrect 
perceptions or beliefs come true. 
 In the composition of the SETEs the administrators bring their own 
expectations about the teachers to the procedure. These expectations 
profoundly effect the way they design the SETEs and the information 
gathering strategies they use. 
 In clinical psychology in the study of interpersonal expectancy 
effects or behavioural confirmation, the problem of making incorrect 
diagnosis supported by presumptive questioning strategy is a serious 
ethical issue that remains a central focus. Observers, no matter how 
well trained and how ethical, will carry out their evaluations based 
on incorrect hypothesis. 
 Snyder and Swann (1978), in a classic study, gave subjects 
a list (personality profile) describing either an extroverted personality or 
an introverted personality and then asked them to choose 12 questions from a 
longer list that would best allow them to test the hypothesis for the profile 
they received for a target person. Analysis demonstrated a heavy emphasis on 
hypothesis-confirming strategies. 
 The process of question selection  and the process applying those 
questions to the evaluation of a person's behaviour are difficult for 
well trained clinicians to perform objectively – the situation of 
untrained students and administrators and teachers is even more problematic.
 When an administration or administrator has decided that teachers fit 
certain stereotypes or engage in certain types of behaviour – negative or 
constructive – the administrator will select hypothesis confirming questions 
for the students to answer. 
 For example, students are asked if the teacher is humourous, do they like 
the teacher, does the teacher stimulate or encourage them, is the teacher 
enthusiastic and dynamic – an entire battery of subjective parameters appear
 on SETEs that lead the students to believe that the teacher must conform to 
certain and possibly irrelevant  behavioural parameters that actually have a 
different appeal to each individual student. 
      [ 
          p. 16  
          ]
    
 As a student answers objective and subjective questions – what will a 
student rely on –  what they feel confident they can answer or  what they 
are unsure about?
 The nature of objective questions present certain problems. How can a 
student know whether a teacher is well prepared – how do they assess 
preparedness? How can a student evaluate a teacher's expertise in their 
field – if they know so much about the field why are they the student? 
Yet students will give answers to these types of questions which shows that
even when they do not have a defensible point of view – they will give an 
opinion. This is not the way to solicit informed opinions.
    Additionally, it is not the students' opinions that have necessarily been 
solicited; they will be answering someone else's questions without having 
given the matter any thought until the point in time when they are supposed 
to 'evaluate' the teacher. 
 The administrators' perceptions of the teachers can also profoundly effect 
the teachers' perceptions of their own effectiveness. Teachers who are told 
that they are teaching poorly because they don't appeal to the parameters 
the students are asked to rate on the SETEs may in fact be teaching at a 
competent level but the administrations' input from the tainted SETEs can 
be amplified by insisting that they are accurate and show the teacher to 
be less than competent. 
  
    | 
	 | 
	
	  "the underlying belief [s] that the process of education is predominantly the sole burden of the teacher. 
 		. . . In this scenario, there is no room for a well rounded evaluation of the students, the management, 
 		the facility, the social pressures and inhibitions Ð a long list of variables is ignored."
	 | 
    
	 | 
  
 And through all of this is the underlying belief that the process of 
education is predominantly the sole burden of the teacher. The assumption 
that the teacher is primarily responsible completely colours the students'
 attitude and the evaluation designer's intent. In this scenario, there is 
no room for a well rounded evaluation of the students, the management, the 
facility, the social pressures and inhibitions – a long list of variables 
is ignored.
In real classrooms
Students' subjective opinions can be so varied that the overall results 
are untrustworthy. Students who are specifically shown that certain SETE 
parameters have been fulfilled may still evaluate related criteria ambivalently. 
Students may pointedly refer to a teacher's physical characteristics or manner 
in very negative or positive terms and judge the teacher on the basis of these 
characteristics – as if teachers who are not aesthetically acceptable are 
rendered less capable of teaching. 
 The entire process of SETEs becomes a convenient matter of picking and 
choosing what serves to comply with the original hypothesis of the SETE 
designer/administrator rather than actually engaging in an honest evaluation. 
This means the evaluation is rather like a shopping list of potentially 
conforming characteristics that further the administrators' personal biases.
     
 [ 
          p. 17  
          ]
    
A proposed paradigm
 Adapted from Arnoult and Anderson (1988) to provide for a better paradigm 
for the evaluation of teacher effectiveness in the academic environment so as 
to reduce an evaluator's biases: (a) gather as much evidence as possible, (b) 
employ multiple evaluators who have different view points and interests, (c) 
vary the observational circumstances to provide for different emphasis in 
the environment, (d) review video tapes for greater accuracy, (e) compare 
the criteria on balance sheets to establish evidence for and against an 
evaluation, (f) solicit an explanation of the results and the subsequent 
conclusions made by evaluators to reveal gaps in reasoning. This paradigm 
constitutes constructive advice for the evaluations we make of others in a 
professional setting. 
This type of evaluation is an example of a structured attempt at 
measuring professional competence with regard for the various facets of 
the evaluating process which is primarily designed to inform the teachers 
rather than to judge them – a philosophy that serves better to encourage 
improvement rather than to punish.
References
 
Arnoult, L. & Anderson, C. A. (1988). Identifying and reducing causal reasoning 
biases in clinical practice. In D. C. Turk & P. Salovey (Eds.), Reasoning, 
inference, and judgment in clinical psychology (pp. 209-232). New York: 
Free Press.
Basow, S. A. (1995). Student evaluations of college professor: 
When gender matters. Journal of Educational Psychology. 87, 656-665.
Darley, J. M., Fleming, J. H., Hilton, J. L., & Swann, W. B. (1988). 
Dispelling negative expectancies: The impact  of interactional goals and 
target practices on the expectancy of the confirmation process. 
Journal of Experimental Social Psychology, 24, 19-36.
Feldman, K. A. (1978). Course characteristics and college students' 
ratings of their teachers: What we know and what we don't. 
Research in Higher Education, 9, 199-242.
Feldman, K. A. (1984). Class size and college students' evaluations of 
teachers and courses: A closer look. Research in Higher Education, 
21, 45-116.
Harris, M. J. (1993). Information gathering strategies in social perception. 
Unpublished manuscript, University of Kentucky, Lexington. Cited in Harris, 1994.
Harris, M. J. (1994). Self-fulfilling prophecies in the clinical context:
Review and implications for clinical practice. Applied and Preventive 
Psychology,  3 (3) 145-158.
Kayne, N. T. & Alloy, L. B. (1988). Clinician and patient as aberrant 
acutaries: Expectation-based distortions in assessment of covariation. 
L. Y. Abramson (Ed.) Social cognition and clinical psychology: 
A synthesis, (pp. 295-365). New York: Guilford Press.
Kishor, N. (1995). The effect of implicit theories on raters' inference 
in performance judgement: consequences for the validity of student ratings 
of instruction. Research in Higher Education, 36 (2) 177-195.
      [ 
          p. 18 
          ]
    
 
Marsh, H. W., & Dunkin, M. J. (1992). Student's evaluations of university 
teaching: A multidimensional perspective. In J. C. Smart (Ed.). 
Higher education: Handbook of theory and research. 
(Vol. 8. pp. 143-233). New York: Agathon Press.
Merton R. K. (1948). The self-fulfilling prophecy. Antioch Review, 
8, 193-210.
Nielsen, R. S. (1993).  The impact of the 1985 reform legislation on the 
formative evaluation practices of one central Illinois school district. 
Doctoral Dissertation, University of Illinois at Urbana-Champaign (in Harris, 
1994:148).
O'Connell, D. Q., & Dickinson, D. J. (1993). Student ratings of instruction as 
a function of testing conditions and perceptions of amount learned. 
Journal of Research and Development in Education, 27 (1) 18-23.
Sackett, P. R. 1982. The interviewer as hypothesis tester. The effects of 
impressions of an applicant  on interviewer questioning strategy. 
Personnel Psychology, 35, 789-804.
Seldin, P. (1993, July 21). The use and abuse of student ratings of professors. 
The Chronicle of Higher Education, p. A40.
Shiozawa T. (1995).  The change of the Monbusho guidelines and their  
impact on language education. Paper. JALT 95, Nagoya Japan. Reprinted 
in PALE Newsletter,(1996) 2, 1.
Smith, M. L. &  Glass, G. V. (1980). Meta-analysis of research on class size 
and its relationship to attitudes and instruction. American Education Research 
Journal, 17, 419-433.
Snyder, M., & Campbell, B. (1980). Testing hypothesis about other people: the
role of the hypothesis. Personality and Social Psychology Bulletin, 
6, 421-426.
Snyder, M., & Swann, W. B. (1978). Hypothesis-testing processes in social 
interaction. Journal of Personality and Social Psychology, 36, 1202-1212.
Snyder, M., & Thomasen, C. J. (1988). Interactions between therapists and clients: 
Hypothesis testing and behavioural confirmation. In C. D. Turk & P. Salovey (Eds.),
Reasoning, inference and judgement in clinical psychology. New York: 
The Free Press.
Stedman, C. H. (1983). The reliability of teaching effectiveness rating scale for 
assessing faculty performance. Tennessee Education, 12 (3) 25-32.
Sugeno K. (1992). Japanese Labour Law, (Leo Kanowitz, Translator) 
Tokyo:  University of Tokyo Press.
Swann, W. B., Jr., & Ely, R. J. (1984). A battle of wills: Self-verification 
versus behavioural confirmation. Journal of Personality and Social 
Psychology, 46, 1287-1302.
Swann, W. B., Jr., & Giuliano, T. 1987. Confirmatory search strategies in 
social interaction: How, when, why, and with what consequences. Journal of 
Social and Clinical Psychology, 5, 511-524.
Tagomori,  H. T. (1993). A content analysis of instruments used for student 
evaluation of faculty in schools of education at universities and colleges 
accredited by the national council for accreditation of teacher education. 
Unpublished  Ed. Doctorate dissertation. University of San Francisco.
Turk, C. D., & Salovey, P. (Eds.) 1988., Reasoning, inference and judgement in 
clinical psychology. New York: The Free Press.
Wigington, H., Tollefson, N. & Rodriguez, E. (1989). Student's ratings of 
instructors revisited: Interactions among class and instructor variables. 
Research in Higher Education, 30 (3) 331-344.
Whitten, B. J., & Umble, M. M. (1980). The relationship of class size, class 
level and core vs. non-core classification for class to student ratings of 
faculty: Implications for validity. Educational and Psychological 
Measurement, 40, 419-423.
 - Return to Part 1 of this article -