ArticlesFree Access

Applying Computerized-Scoring Models of Written Biological Explanations across Courses and Colleges: Prospects and Limitations

Minsu Ha

*The Ohio State University, School of Teaching and Learning, Columbus, OH 43210

Search for more papers by this author

Ross H. Nehm

Michigan State University, 1428 Engineering, East Lansing, MI 48824

Search for more papers by this author

Mark Urban-Lurain

Michigan State University, 6171 Biomedical Physical Sciences, East Lansing, MI 48824

Search for more papers by this author

, and

John E. Merrill

Michigan State University, 6171 Biomedical Physical Sciences, East Lansing, MI 48824

Search for more papers by this author

Vivian Siegel, Monitoring Editor

Published Online:13 Oct 2017https://doi.org/10.1187/cbe.11-08-0081

View article

Abstract

Our study explored the prospects and limitations of using machine-learning software to score introductory biology students’ written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored student responses are needed to build scoring models suitable for cross-institutional application? 3) What factors limit computer-scoring efficacy, and how can these factors be mitigated? To answer these questions, two biology experts scored a corpus of 2556 short-answer explanations (from biology majors and nonmajors) at two universities for the presence or absence of five key concepts of evolution. Human- and computer-generated scores were compared using kappa agreement statistics. We found that machine-learning software was capable in most cases of accurately evaluating the degree of scientific sophistication in undergraduate majors’ and nonmajors’ written explanations of evolutionary change. In cases in which the software did not perform at the benchmark of “near-perfect” agreement (kappa > 0.80), we located the causes of poor performance and identified a series of strategies for their mitigation. Machine-learning software holds promise as an assessment tool for use in undergraduate biology education, but like most assessment tools, it is also characterized by limitations.

Vol. 10, No. 4

December 01, 2011

329-435

Supplemental Materials

Metrics

Downloads & Citations

Downloads: 288

Citations: 56

History

Submitted: 27 August 2011

Revised: 29 September 2011

Accepted: 29 September 2011

Information

© 2011 M. Ha et al. CBE—Life Sciences Education © 2011 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

We thank Kristen Smock and Judith Ridgway, OSU, and Donna Koslowsky and Tammy Long, MSU, for assistance with data gathering; Luanna Prevost for comments on the manuscript; and the Automated Analysis of Constructed Response (AACR) group for discussions. R.H.N. thanks Elijah Mayfield and Caroline Rosé and the faculty of the Carnegie Mellon Pittsburgh Science of Learning Center summer school for help with machine-learning methods. We also thank two reviewers for helpful suggestions for improving the manuscript. We thank the National Science Foundation (NSF; REESE grant 090999 to principal investigator [PI] R.H.N.) for funding M.H. and collaborative NSF CCLI 1022653 to PIs Jenny Knight, R.H.N., and M.U.-L. for support of the AACR group. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the view of the NSF. The research was conducted under OSU IRB Protocol # 2008B0080 (R.H.N., PI).

PDF download