Repository logo
 

CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Fathullah, Y 
Liusie, A 
Raina, V 
Raina, V 

Abstract

In this paper, we consider the challenge of summarizing patients' medical progress notes in a limited data setting. For the Problem List Summarization (shared task 1A) at the BioNLP Workshop 2023, we demonstrate that Clinical-T5 fine-tuned to 765 medical clinic notes outperforms other extractive, abstractive and zero-shot baselines, yielding reasonable baseline systems for medical note summarization. Further, we introduce Hierarchical Ensemble of Summarization Models (HESM), consisting of token-level ensembles of diverse fine-tuned Clinical-T5 models, followed by Minimum Bayes Risk (MBR) decoding. Our HESM approach lead to a considerable summarization performance boost, and when evaluated on held-out challenge data achieved a ROUGE-L of 32.77, which was the best-performing system at the top of the shared task leaderboard.

Description

Keywords

Journal Title

Proceedings of the Annual Meeting of the Association for Computational Linguistics

Conference Name

Proceedings of the 22nd Workshop on Biomedical Language Processing

Journal ISSN

0736-587X

Volume Title

Publisher

Publisher DOI

Publisher URL

Sponsorship
Cambridge Assessment (unknown)
1. Cambridge University Press & Assessment (CUP&A), a department of The Chancellor, Masters, and Scholars of the University of Cambridge. 2. EPSRC (The Engineering and Physical Sciences Research Council) Doctoral Training Partnership (DTP) PhD studentship, 3. Cambridge International & St John’s College scholarship, and the Gates Cambridge Scholarship.