Repository logo
 

Research data supporting "A Simple and Efficient Approach to Unsupervised Instance Matching and its Application to Linked Data of Power Plants"


No Thumbnail Available

Type

Dataset

Change log

Authors

Eibeck, Andreas 
Shaocong, Zhang 
Mei Qi, Lim 

Description

The ZIP archive contains files for two new domains: Power plants in the United Kingdom and in Germany, respectively. The files have been used to train and evaluate algorithms in the field of instance matching. For each domain, there is a subdirectory containing two data files (tableA.csv and tableB.csv) and three labelled sample files (train.csv, valid.csv and test.csv). The files use Magellan's CSV format (see https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md). The README.txt file contains additional information. The instance matching code was published on https://github.com/cambridge-cares/TheWorldAvatar.

Version

Software / Usage instructions

The files in csv, txt, xls and pdf format can be opened by any standard text editor.

Keywords

instance matching, knowledge graph, power plants, record linkage, Semantic web

Publisher

Sponsorship
This project is funded by the National Research Foundation (NRF), Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme. Part of this work was supported by Towards Turing 2.0 under the EPSRC Grant EP/W037211/1 & The Alan Turing Institute. M.K. gratefully acknowledges the support of the Alexander von Humboldt Foundation.
Relationships
Supplements: