The Accuracy and Validity of the Simplified Pairs Method of Comparative Judgement in Highly Structured Papers
Authors
Leech, Tony
Gill, Tim
Hughes, Sarah
Benton, Tom
Publication Date
2022-04-28Journal Title
Frontiers in Education
ISSN
2504-284X
Publisher
Frontiers Media SA
Volume
7
Language
en
Type
Article
This Version
VoR
Metadata
Show full item recordCitation
Leech, T., Gill, T., Hughes, S., & Benton, T. (2022). The Accuracy and Validity of the Simplified Pairs Method of Comparative Judgement in Highly Structured Papers. Frontiers in Education, 7 https://doi.org/10.3389/feduc.2022.803040
Abstract
<jats:p>Comparative judgement (CJ) is often said to be more suitable for judging exam questions inviting extended responses, as it is easier for judges to make holistic judgements on a small number of large, extended tasks than a large number of smaller tasks. On the other hand, there is evidence it may also be appropriate for judging responses to papers made up of many smaller structured tasks. We report on two CJ exercises on mathematics and science exam papers, which are constructed mainly of highly structured items. This is to explore whether judgements processed by the simplified pairs version of CJ can approximate the empirical difference in difficulty of pairs of papers. This can then be used to maintain standards between exam papers. This use of CJ, not its other use as an alternative to marking, is the focus of this paper. Within the exercises discussed, panels of experienced judges looked at pairs of scripts, from different sessions of the same test, and their judgements were processed <jats:italic>via</jats:italic> the simplified pairs CJ method. This produces a single figure for the estimated difference in difficulty between versions. We compared this figure to the difference obtained from traditional equating, used as a benchmark. In the mathematics study the difference derived from judgement <jats:italic>via</jats:italic> simplified pairs closely approximated the empirical equating difference. However, in science, the CJ outcome did not closely align with the empirical difference in difficulty. Reasons for the discrepancy may include the differences in the content of the exams or the specific judges. However, clearly, comparative judgement need not lead to an accurate impression of the relative difficulty of different exams. We discuss self-reported judge views on how they judged, including what questions they focused on, and the implications of these for the validity of CJ. Processes used when judging papers made up of highly structured tasks were varied, but judges were generally consistent enough. Some potential challenges to the validity of comparative judgement are present with judges sometimes using re-marking strategies, and sometimes focusing attention on subsets of the paper, and we explore these. A greater understanding of what judges are doing when they judge comparatively brings to the fore questions of judgement validity that remain implicit in marking and non-comparative judgement contexts.</jats:p>
Keywords
Education, comparative judgement, pairwise comparisons, standard maintaining, structured exams, educational assessment, simplified pairs
Identifiers
External DOI: https://doi.org/10.3389/feduc.2022.803040
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337064
Rights
Licence:
http://creativecommons.org/licenses/by/4.0/
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.