ASCB logo LSE Logo

Applying Computerized-Scoring Models of Written Biological Explanations across Courses and Colleges: Prospects and Limitations

    Published Online:https://doi.org/10.1187/cbe.11-08-0081

    Our study explored the prospects and limitations of using machine-learning software to score introductory biology students’ written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored student responses are needed to build scoring models suitable for cross-institutional application? 3) What factors limit computer-scoring efficacy, and how can these factors be mitigated? To answer these questions, two biology experts scored a corpus of 2556 short-answer explanations (from biology majors and nonmajors) at two universities for the presence or absence of five key concepts of evolution. Human- and computer-generated scores were compared using kappa agreement statistics. We found that machine-learning software was capable in most cases of accurately evaluating the degree of scientific sophistication in undergraduate majors’ and nonmajors’ written explanations of evolutionary change. In cases in which the software did not perform at the benchmark of “near-perfect” agreement (kappa > 0.80), we located the causes of poor performance and identified a series of strategies for their mitigation. Machine-learning software holds promise as an assessment tool for use in undergraduate biology education, but like most assessment tools, it is also characterized by limitations.