Show simple item record

dc.contributor.authorLeech, Tony
dc.contributor.authorGill, Tim
dc.contributor.authorHughes, Sarah
dc.contributor.authorBenton, Tom
dc.date.accessioned2022-05-12T09:00:09Z
dc.date.available2022-05-12T09:00:09Z
dc.date.issued2022-04-28
dc.date.submitted2021-10-27
dc.identifier.issn2504-284X
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/337064
dc.description.abstract<jats:p>Comparative judgement (CJ) is often said to be more suitable for judging exam questions inviting extended responses, as it is easier for judges to make holistic judgements on a small number of large, extended tasks than a large number of smaller tasks. On the other hand, there is evidence it may also be appropriate for judging responses to papers made up of many smaller structured tasks. We report on two CJ exercises on mathematics and science exam papers, which are constructed mainly of highly structured items. This is to explore whether judgements processed by the simplified pairs version of CJ can approximate the empirical difference in difficulty of pairs of papers. This can then be used to maintain standards between exam papers. This use of CJ, not its other use as an alternative to marking, is the focus of this paper. Within the exercises discussed, panels of experienced judges looked at pairs of scripts, from different sessions of the same test, and their judgements were processed <jats:italic>via</jats:italic> the simplified pairs CJ method. This produces a single figure for the estimated difference in difficulty between versions. We compared this figure to the difference obtained from traditional equating, used as a benchmark. In the mathematics study the difference derived from judgement <jats:italic>via</jats:italic> simplified pairs closely approximated the empirical equating difference. However, in science, the CJ outcome did not closely align with the empirical difference in difficulty. Reasons for the discrepancy may include the differences in the content of the exams or the specific judges. However, clearly, comparative judgement need not lead to an accurate impression of the relative difficulty of different exams. We discuss self-reported judge views on how they judged, including what questions they focused on, and the implications of these for the validity of CJ. Processes used when judging papers made up of highly structured tasks were varied, but judges were generally consistent enough. Some potential challenges to the validity of comparative judgement are present with judges sometimes using re-marking strategies, and sometimes focusing attention on subsets of the paper, and we explore these. A greater understanding of what judges are doing when they judge comparatively brings to the fore questions of judgement validity that remain implicit in marking and non-comparative judgement contexts.</jats:p>
dc.languageen
dc.publisherFrontiers Media SA
dc.subjectEducation
dc.subjectcomparative judgement
dc.subjectpairwise comparisons
dc.subjectstandard maintaining
dc.subjectstructured exams
dc.subjecteducational assessment
dc.subjectsimplified pairs
dc.titleThe Accuracy and Validity of the Simplified Pairs Method of Comparative Judgement in Highly Structured Papers
dc.typeArticle
dc.date.updated2022-05-12T09:00:08Z
prism.publicationNameFrontiers in Education
prism.volume7
dc.identifier.doi10.17863/CAM.84487
dcterms.dateAccepted2022-04-06
rioxxterms.versionofrecord10.3389/feduc.2022.803040
rioxxterms.versionVoR
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/
dc.identifier.eissn2504-284X
cam.issuedOnline2022-04-28


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record