Research Matters 33
Permanent URI for this collection
Browse
Recent Submissions
Item Open Access Published version Peer-reviewed Research Matters 33: Spring 2022(Research Division, Cambridge University Press & Assessment, 2022-03-01) Bramley, TomResearch Matters is a free biannual publication which allows Cambridge University Press & Assessment to share its assessment research, in a range of fields, with the wider assessment community. In this edition of Research Matters we are seeing significant refinement in both application and thinking associated with Comparative Judgement. Genuinely ground-breaking, the wide-ranging studies and projects examine its limits and processes as well as its relation to existing assessment approaches.Item Open Access Published version Peer-reviewed The concurrent validity of comparative judgement outcomes compared with marks(Research Division, Cambridge University Press & Assessment, 2022-03-01) Gill, TimIn Comparative Judgement (CJ) exercises, examiners are asked to look at a selection of candidate scripts (with marks removed) and order them in terms of which they believe display the best quality. By including scripts from different examination sessions, the results of these exercises can be used to help with maintaining standards. Results from previous CJ studies have demonstrated that the method appears to be valid and reliable in many contexts. However, it is not entirely clear whether CJ works as well as it does because of the physical and judgemental processes involved (i.e., placing two scripts next to each other and deciding which is better based on an intuitive, holistic, and relative judgement), or because CJ exercises capture a lot of individual paired comparison decisions quickly. This article adds to the research on this question by re-analysing data from previous CJ studies and comparing the concurrent validity of the outcomes of individual CJ paired comparisons with the concurrent validity of outcomes based on the original marks given to scripts. The results show that for 16 out of the 20 data sets analysed, mark-based decisions had higher concurrent validity than CJ-based decisions. Two possible reasons for this finding are: CJ decisions reward different skills to marks; or individual CJ decisions are of lower quality than individual decisions based on marks. Either way, the implication is that the CJ method works because many individual paired comparison decisions are captured quickly, rather than because of the physical and psychological processes involved in making holistic judgements.Item Open Access Published version Peer-reviewed Moderation of non-exam assessments: is Comparative Judgement a practical alternative?(Research Division, Cambridge University Press & Assessment, 2022-03-01) Vidal Rodeiro, Carmen; Chambers, LucyMany high-stakes qualifications include non-exam assessments that are marked by teachers. Awarding bodies then apply a moderation process to bring the marking of these assessments to an agreed standard. Comparative Judgement (CJ) is a technique where two (or more) pieces of work are compared at a time, allowing an overall rank order of work to be generated. This study explored the practical feasibility of using CJ for moderation via an experimental moderation task requiring judgements of pairs of authentic portfolios of work. This included aspects such as whether moderators can view and navigate the portfolios sufficiently to enable them to make the comparative judgements, on what basis they make their decisions, whether moderators can be confident making CJ judgements on large pieces of candidate work (e.g., portfolios), and the time taken to moderate.Item Open Access Published version Peer-reviewed Judges’ views on pairwise Comparative Judgement and Rank Ordering as alternatives to analytical essay marking(Research Division, Cambridge University Press & Assessment, 2022-03-01) Walland, EmmaIn this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of ten essays using RO, I collected data on their experiences and views of the methods through interviews and questionnaires. I analysed the data using thematic content analysis. The findings highlight that, if the methods were to be used as alternatives to marking, examiners and other stakeholders would need reassurance that the methods are fair, valid and reliable. Examiners would also need more training and support to help them to judge holistically. The lack of detail about how judgements are made using these methods is a concern worth following up and addressing before implementation.Item Open Access Published version Peer-reviewed How do judges in Comparative Judgement exercises make their judgements?(Research Division, Cambridge University Press & Assessment, 2022-03-01) Leech, Tony; Chambers, LucyTwo of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the context of CJ for standard maintaining, by reporting the results of both a study into the processes used by judges when making CJ judgements, and the outcomes of surveys of judges who have used CJ. In the first instance, using insights from observations of judges and their being asked to think aloud while they judged, we highlight the variety of processes used when making their decisions, including comparative reference, re-marking and question-by question evaluation. We then develop a four dimension model to explore what impacts what judges attend to, and explore through survey responses the distinctive ways in which the structure of the question paper, different elements of candidate responses, judges' own preferences and the CJ task itself affect decision-making. We conclude by discussing, in the light of these factors, whether the judgements made in CJ (or in the judgemental element of current standard maintaining procedures) are meaningfully holistic, and whether judges can properly take into account differences in difficulty between different papers.Item Open Access Published version Peer-reviewed How are standard-maintaining activities based on Comparative Judgement affected by mismarking in the script evidence?(Research Division, Cambridge University Press & Assessment, 2022-03-01) Williamson, JoannaAn important application of Comparative Judgement (CJ) methods is to assist in the maintenance of standards from one series to another in high stakes qualifications, by informing decisions about where to place grade boundaries or cut scores. This article explores the extent to which standard-maintaining activities based on Comparative Judgement would be robust to mismarking in the sample of scripts used for the comparison exercise. While extreme marking errors are unlikely, we know that mismarking can occur in live assessments, and quality of marking can vary. This research investigates how this could affect the outcomes of CJ-based methods, and therefore contributes to better understanding of the risks associated with using CJ-based methods for standard maintaining. The article focuses on the 'simplified pairs' method (Benton et al., 2020), an example of the 'universal method' discussed by Benton (this issue).Item Open Access Published version Peer-reviewed A summary of OCR’s pilots of the use of Comparative Judgement in setting grade boundaries(Research Division, Cambridge University Press & Assessment, 2022-03-01) Benton, Tom; Gill, Tim; Hughes, Sarah; Leech, TonyThe rationale for the use of comparative judgement (CJ) to help set grade boundaries is to provide a way of using expert judgement to identify and uphold certain minimum standards of performance rather than relying purely on statistical approaches such as comparable outcomes. This article summarises the results of recent trials of using CJ for this purpose in terms of how much difference it might have made to the positions of grade boundaries, the reported precision of estimates and the amount of time that was required from expert judges. The results show that estimated grade boundaries from a CJ approach tend to be fairly close to those that were set (using other forms of evidence) in practice. However, occasionally, CJ results displayed small but significant differences with existing boundary locations. This implies that adopting a CJ approach to awarding would have a noticeable impact on awarding decisions but not such a large one as to be implausible. This article also demonstrates that implementing CJ using simplified methods (described by Benton, Cunningham et al, 2020) achieves the same precision as alternative CJ approaches, but in less time. On average, each CJ exercise required roughly 30 judge-hours across all judges.