Repository logo
 

Quantifying gender bias towards politicians in cross-lingual language models.

Published version
Peer-reviewed

Repository DOI


Change log

Authors

Stańczak, Karolina  ORCID logo  https://orcid.org/0000-0001-7326-9594
Ray Choudhury, Sagnik 
Pimentel, Tiago 
Cotterell, Ryan 
Augenstein, Isabelle 

Abstract

Recent research has demonstrated that large pre-trained language models reflect societal biases expressed in natural language. The present paper introduces a simple method for probing language models to conduct a multilingual study of gender bias towards politicians. We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender. To this end, we curate a dataset of 250k politicians worldwide, including their names and gender. Our study is conducted in seven languages across six different language modeling architectures. The results demonstrate that pre-trained language models' stance towards politicians varies strongly across analyzed languages. We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians. Finally, and contrary to previous findings, our study suggests that larger language models do not tend to be significantly more gender-biased than smaller ones.

Description

Acknowledgements: The authors would like to thank Eleanor Chodroff, Clara Meister, and Zeerak Talat for their feedback on the manuscript.

Keywords

Humans, Female, Male, Sexism, Language, Multilingualism, Names, Bias

Journal Title

PLoS One

Conference Name

Journal ISSN

1932-6203
1932-6203

Volume Title

18

Publisher

Public Library of Science (PLoS)
Sponsorship
Danmarks Frie Forskningsfond (9130-00092B)