Repository logo
 

The Indo-European Cognate Relationships dataset.

Published version
Peer-reviewed

Repository DOI


Change log

Abstract

The Indo-European Cognate Relationships (IE-CoR) dataset is an open-access relational dataset showing how related, inherited words ('cognates') pattern across 160 languages of the Indo-European family. IE-CoR is intended as a benchmark dataset for computational research into the evolution of the Indo-European languages. It is structured around 170 reference meanings in core lexicon, and contains 25731 lexeme entries, analysed into 4981 cognate sets. Novel, dedicated structures are used to code all known cases of horizontal transfer. All 13 main documented clades of Indo-European, and their main subclades, are well represented. Time calibration data for each language are also included, as are relevant geographical and social metadata. Data collection was performed by an expert consortium of 89 linguists drawing on 355 cited sources. The dataset is extendable to further languages and meanings and follows the Cross-Linguistic Data Format (CLDF) protocols for linguistic data. It is designed to be interoperable with other cross-linguistic datasets and catalogues, and provides a reference framework for similar initiatives for other language families.

Description

Acknowledgements: The IE-CoR dataset was developed as a collaborative enterprise by a consortium of contributors who provided language data by making lexeme determinations for individual languages and/or cognacy determinations between languages. We thank all contributors to the IE-CoR dataset. The basic relational dataset structure for IE-CoR was inherited from the LexDB system developed by Michael Dunn. We thank Michelle O’Reilly for preparation of the figures in this paper. This research was funded by the Max Planck Society, through the Department of Linguistic and Cultural Evolution at the Max Planck Institute for the Science of Human History (Jena, Germany) and thereafter at the Max Planck Institute for Evolutionary Anthropology (Leipzig, Germany). From 1 January 2024 to 30 June 2024, C.A. was funded by BA grant GP300169. From 11 September 2021 to 10 September 2022, P.H. was funded by the ERC Starting Grant “Waves” (ERC758967). E.A. and G.H. were partially funded by an Alexander von Humboldt Research Fellowship for Experienced Researchers (2016-2018, grant No. 3.1-CAN-1164714-HFST-E). E.A. was also partially funded by a Social Sciences and Humanities Research Council of Canada (SSHRC) Insight Development Grant (2015-2017, grant No. 430-2015-00031).

Journal Title

Sci Data

Conference Name

Journal ISSN

2052-4463
2052-4463

Volume Title

12

Publisher

Springer Nature

Rights and licensing

Except where otherwised noted, this item's license is described as http://creativecommons.org/licenses/by/4.0/