Here are some suggested answers for the questions about testing, statistics, and assessment from the
May 2009 issue of SHIKEN. If any answer seem unclear or you have further questions, contact the author at newfields55 at yahoo dot com. |
Further reading: Kunnan, A. J. (Ed.) (1998). Validation in Language Assessment. Mahwah, NJ: Lawrence Erlbaum Associates. Westen, D. & Rosenthal, R. (2003). Quantifying construct validity: Two simple measures. Journal of Personality and Social Psychology, 84 (3) 608-618. Retrieved April 10, 2009 from http://www.psychsystems.net/lab/Quant_Const.pdf |
[ p. 28 ]
Unidimensionality is the flip side of this coin: it implies that only one latent trait is involved in each given analysis. If more than one trait is somehow involved in an analysis (which generally happens to some degree in real life), the principle of unidimensionality is compromised. Many IRT and Rasch measures are robust enough to tolerate small violations of this principle, but if significant violations occur, more confounding errors will arise.
Further reading: Baghaei, P. (2008). Local dependency and Rasch measures. Rasch Measurement Transactions, 21 (3) 1105-6. Retrieved April 11, 2009 from http://www.rasch.org/rmt/rmt213b.htm Beguin, A. A. (2000). Robustness of equating high-stakes tests. Retrieved April 11, 2009 from http://www.cito.nl/share/poc/dissertaties/dissertationbeguin2000.pdf Brannick, M. T. (n.d.) Item Response Theory. Retrieved April 11, 2009 from http://luna.cas.usf.edu/~mbrannic/files/pmet/irt.htm |
[ p. 29 ]
Further reading: Brown, J.D. Skewness and kurtosis. Shiken: JALT Testing & Evaluation SIG Newsletter, 1 (1) 20 - 23. Retrieved April 13, 2009 from http://jalt.org/test/PDF/Brown1.pdf Esty, W. W. & Banfield, J. D. (2003, October). The Box-Percentile Plot. Journal of Statistical Software, 8 (17). Retrieved April 13, 2009 from http://www.jstatsoft.org/v08/i17/paper Pezzullo, J. C. (2009). Free Statistical Software. Retrieved April 13, 2009 from http://www.statpages.org/javasta2.html Wikipedia. (2009). List of statistical packages. Retrieved April 13, 2009 from http://en.wikipedia.org/wiki/Statistical_software |
[ p. 30 ]
Further reading: Egghe, L. (2006) An improvement to the h-index: The g-index. ISSI Newsletter 2 (1) 8-9. Retrieved April 14, 2009 from http://stat-athens.aueb.gr/~jpan/Egghe-ISSI-2006.pdf Jin, B-H., Liang, L., Rousseau, R. and Egghe, L. (2007). The R- and AR- indices: Complementing the h-index. Chinese Science Bulletin, 52, 855-863. Retrieved April 14, 2009 from http://dx.doi.org/10.1007/s11434-007-0145-9 Kosmulski, M. (2007). MAXPROD - A new index for assessment of the scientific output of an individual, and a comparison with the h-index. International Journal of Scientometrics, Informetrics and Bibliometrics, 11 (1). Paper 5. Retrieved April 14, 2009 from http://cybermetrics.cindoc.csic.es/articles/v11i1p5.pdf) Panaretos, J. & Malesios, C. (2009, January 18). Assessing scientific research performance and impact with single indices. MPRA Paper No. 12842. Retrieved April 14, 2009 from http://mpra.ub.uni-muenchen.de/12842/ Rousseau, R. (2008, June). Reflections on recent developments of the h-index and h-type indices. In H. Kretschmer & F. Havemann (Eds.). Proceedings of WIS 2008, Berlin. Retrieved April 14, 2009 from http://www.tarupublications.com/journals/cjsim/7-Rousseau.pdf |
Further reading: Baytekin, O. (2002) A x2 analysis of the Poisson approximation to binomial distribution. Marmara University Journal of Pure and Applied Sciences, 18 33-36. Retrieved April 15, 2009 from http://fbe.marmara.edu.tr/dergi/pdf/inga02004.pdf Di Raimondo, T. et al. (2007). Discrete distributions: hypergeometric, binomial, and poisson. Retrieved April 15, 2009 from http://controls.engin.umich.edu/wiki/index.php/Discrete_Distributions:_hypergeometric,_ binomial, and_poisson El Sherbiny, M. M. (2007, November 4). Discrete Probability Distributions. Retrieved April 15, 2009 from http://faculty.ksu.edu.sa/73212/Publications/Discrete%20Probability%20Distributions.ppt Nandamurar, K. (n.d.). Poisson Distribution. Retrieved April 15, 2009 from http://www.cse.msu.edu/~nandakum/nrg/Tms/Probability/poisson.htm West Virginia University Department of Statistics. (2006). The Poisson distribution. Retrieved April 15, 2009 from http://ideal.stat.wvu.edu:8080/ideal/resource/modules/1/Poisson/poisson.html |
[ p. 31 ]
Further reading: Cramster, Inc. (2009). Q-Q plot. Retrieved April 17, 2009 from http://www.cramster.com/ reference/wiki.aspx?wiki_name=Q-Q_plot Simon K. (n.d.) Pareto Chart. Retrieved April 17, 2009 from http://www.gate2quality.com/quality-tools_2.html United Stated Department of Commerce Information Technology Laboratory: Statistical Engineering Division. (2006, July 16). NIST/SEMATECH e-Handbook of Statistical Methods: 1.3.3.24. Quantile-Quantile Plot. Retrieved April 19, 2009 from http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm |
Further reading: Jacobs, L. C. (1991). Test Reliability. Retrieved April 22, 2009 from http://www.indiana.edu/~best/test_reliability.shtml Winsteps. (2009). Winsteps Help for Rasch Analysis: Reliability and separation of measures. Retrieved April 22, 2009 from http://www.winsteps.com/winman/index.htm?reliability.htm |
[ p. 32 ]
Further reading: Andrich, D. (1982). An Index of Person Separation in Latent Trait Theory, the Traditional KR-20 Index, and the Guttman Scale Response Pattern. Education Research and Perspectives, 9 (1) 95-104. Retrieved April 24, 2009 from http://www.rasch.org/erp7.htm Bodner, G. (1980). Statistical Analysis of Multiple Choice Exams: Coefficients of Reliability. Journal of Chemical Education, 57, 188-190. Retrieved April 24, 2009 from http://chemed.chem.purdue.edu/chemed/stats.html Brown, J. D. (2002). Do cloze tests work? Or, it is just an Illusion? University of Hawaii Working Papers in Second Language Studies, 21 (1). Retrieved April 26, 2009 from http://www.hawaii.edu/sls/uhwpesl/21(1)/BrownCloze.pdf Halle, C. D. (2009). Active Teaching, Learning, and Assessment: Unit 4: Validity and Reliability. Retrieved April 24, 2009 from charlesdennishale.com/books/eets_ap/ 3_Psychometrics_Reliatility_Validity_Sampling.pdf Iacobucci, D. & Duhachek, A. (2003). Advancing alpha: Measuring reliability with confidence. Journal of Consumer Psychology, 13 (4),478 - 487. Retrieved April 26, 2009 from http://mba.vanderbilt.edu/vanderbilt/data/research/2190full.pdf Linacre, J. M. (1997). KR-20 or Rasch Reliability: Which Tells the "Truth"? Rasch Measurement Transactions, 11 (3) 580 - 581. Retrieved April 26, 2009 from http://www.rasch.org/rmt/rmt113l.htm |
[ p. 33 ]
5 Q: What is an ideal item facility index range for a 4-choice, multiple-choice norm-referenced test item? What about a true-false norm-referenced test item?
Further reading: Brown, J. D. (2003). Norm-referenced item analysis (item facility and item discrimination). Shiken: JALT Testing & Evaluation SIG Newsletter, 7 (2) 16 – 19. Retrieved April 16, 2009 from http://jalt.org/test/PDF/Brown17.pdf Lord, F. M. (1977). Optimal number of choices per item: A comparison of four approaches. Journal of Educational Measurement, 14 (1), 33-38. The University of Texas at Austin Division of Instructional Innovation and Assessment. (2007, July 16). Analyzing Multiple-Choice Item Responses. Retrieved April 16, 2009 from http://www.utexas.edu/academic/mec/scan/analysis.html Whatley, M. A. (2007). Item Analysis Worksheet. Retrieved April 16, 2009 from http://chiron.valdosta.edu/ mawhatley/3900/itemanalysis.pdf |
[ p. 34 ]