Repository logo
 

A Framework for Understanding Selection Bias in Real-World Healthcare Data

Accepted version
Peer-reviewed

Change log

Authors

Kundu, Ritoban 
Shi, Xu 
Morrison, Jean 
Mukherjee, Bhramar 

Abstract

Using administrative patient-care data such as Electronic Health Records (EHR) and medical/ pharmaceutical claims for population-based scientific research has become increasingly common. With vast sample sizes leading to very small standard errors, researchers need to pay more attention to potential biases in the estimates of association parameters of interest, specifically to biases that do not diminish with increasing sample size. Of these multiple sources of biases, in this paper, we focus on understanding selection bias. We present an analytic framework using directed acyclic graphs for guiding applied researchers to dissect how different sources of selection bias may affect estimates of the association between a binary outcome and an exposure (continuous or categorical) of interest. We consider four easy-to-implement weighting approaches to reduce selection bias with accompanying variance formulae. We demonstrate through a simulation study when they can rescue us in practice with analysis of real world data. We compare these methods using a data example where our goal is to estimate the well-known association of cancer and biological sex, using EHR from a longitudinal biorepository at the University of Michigan Healthcare system. We provide annotated R codes to implement these weighted methods with associated inference.

Description

Keywords

Journal Title

Journal of the Royal Statistical Society: Series A (Statistics in Society)

Conference Name

Journal ISSN

0964-1998
1467-985X

Volume Title

Publisher

Royal Statistical Society

Publisher DOI

Publisher URL

Sponsorship
MRC (unknown)