Using genetics to disentangle the complex relationship between food choices and health status

Despite food choices being one of the most important factors influencing health, efforts to identify individual food groups and dietary patterns that cause disease have been challenging, with traditional nutritional epidemiological approaches plagued by biases and confounding. After identifying 302 (289 novel) individual genetic determinants of dietary intake in 445,779 individuals in the UK Biobank study, we develop a statistical genetics framework that enables us, for the first time, to directly assess the impact of food choices on health outcomes. We show that the biases which affect observational studies extend also to GWAS, genetic correlations and causal inference through genetics, which can be corrected by applying our methods. Finally, by applying Mendelian Randomization approaches to the corrected results we identify some of the first robust causal associations between eating patterns and risks of cancer, heart disease and obesity, distinguishing between the effects of specific foods or dietary patterns.


Introduction 51
Given their profound impact on human well-being, nutritional choices and their impact on health 52 are one of the most studied human behaviours. Quality and quantity of food consumption are 53 associated with a wide range of medical conditions including metabolic syndrome and 54 cardiovascular disease 1 , cancer 1 , liver disease 2 , inflammatory bowel disease 3 and depression 4 . 55 Food choice is becoming increasingly significant for global health as energy-dense, low fibre 56 western diets proliferate across the globe and an obesity epidemic follows 4 . Despite the extremely 57 high number of studies reporting food/health associations it has been hard to establish causal 58 relationships due to difficulty in measurement, recall bias and confounding. 59 60 randomised controlled trials 5 . It is thus appealing to use MR to assess the causal relationship 66 between food and health. Unfortunately, genetic variants predicting dietary consumption has been 67 limited to a few food groups, such as alcoholic beverages 6 , coffee 7 , milk 8,9 , and existing evidence 68 from dietary MR studies remain unremarkable 10,11 . More importantly, previous studies on a single 69 food group have not accounted for interrelationships between different food groups. We therefore 70 aimed to assess the causal relationship between food and several health outcomes by exploiting 71 consumption patterns of multiple food groups in the UK Biobank (UKB) to create a new set of 72 genetic instruments for MR analysis and then testing the causal effect of food consumption on 73 health. 12 74 75

GWAS of food traits 76
The first step in MR is to identify those genetic variants which are associated with the exposure of 77 interest (food consumption in our case). We thus conducted a genome-wide association study 78 (GWAS) on 29 food consumption traits, such as "beef" and "cheese" intake, using a mixed linear 79 model in the white European participants of UKB 13 (up to N=445,779), including only sex and age 80 as covariates to avoid collider bias 14 For a full description of the traits see Tables S1 and S2. The 81 GWAS identified 414 phenotype-genotype associations divided into 260 independent loci with p < 82 1 x 10 -8 , summarized in Table S3 and Figure 1. were available. Educational attainment was also included as a proxy for socioeconomic status. 110 Using MR we identified 81 instances where we had evidence of health-related traits significantly 111 influencing food choice (Fig. 2) For example, higher genetically-determined BMI associates with higher consumption of poultry, 121 vegetables (both raw and cooked), non-oily fish, (also spirits and coffee); but less beef, processed 122 meat, bread and fatty foods. Similarly, those genetically predisposed to CHD report lower 123 consumption of whole milk, salt and lamb; and higher consumption of fish and red wine. This last 124 case is particularly interesting, reflecting the standard dietary advice (lower intake of fat and salt 125 but higher intake of fish as a means to increase omega-3 fatty acid intake 18 ), but also higher 126 The combined results from all traits before and after adjustment for the effect of health status on 171 food preference are shown in Fig. 1 (see Supplementary file 1 for trait-specific plots). In many loci 172 previously associated with health-related traits, the effect changed dramatically, suggesting that 173 the effect of the SNP on the food traits is mediated through health status. For example, the effect 174 size of the lead FTO variant (rs55872725) with percentage fat in milk reduces by three-fold from 175 0.0045 to 0.0015 log units (p=2x10 -29 and p=7x10 -5 , respectively). We observed similar effects for 176 other associations at the same locus, which suggests that in general the associations we are 177 observing near FTO are primarily mediated through its strong association with BMI 22 . 178 This insight is crucial to understanding: a naïve approach would interpret that eating less healthy 179 foods and more calorie-dense foods would lead to a lower BMI, while in fact, our analysis suggests 180 that it is having a higher BMI that leads to either having a healthier diet or reporting one. This To further explore the effects of the correction procedure, we compared the correlation patterns 192 between the food traits and 832 phenotypes present in the LD hub 25 database using the raw and that the correction produced more meaningful food clusters and that in many cases the genetic 195 correlations with other traits changed greatly (see https://npirastu.shinyapps.io/rg_plotter_2/ for a 196 graphical representation of these results). For example, if we look at the relationship of the two fat 197 intake traits (percentage fat in milk and adding spread to bread) and body fat percentage we can 198 see that they both have a seemingly beneficial effect before correction (r G = -0.43 and -0.10, 199 respectively) which diminishes to near zero (r G = -0.04 and 0.07) after applying the correction, 200 suggesting that the apparent protective effect is likely due to confounding. temperature and tea; and cheese and bread; these were not used for the MV analysis. In order to 216 explore if additional loci influence these groups, we ran a multivariate GWAS using the package 217 MultiABEL, which performs MANOVA on summary statistics. 168 additional associations, including 218 42 novel loci not identified in the single-trait analysis, were identified in multivariate analysis of the 219 three main food groups (Table S5). 220 221

Selection of instruments for MR 222
The primary objective of our study is to use MR to assess causal relationships between food 223 choices and health. To achieve this goal we need to be able to identify the SNPs which have only 224 a direct effect on the food trait, which is not mediated through other possible confounders. We 225 hypothesised that if a SNP is biologically associated with a food behaviour -without mediation by 226 health -its effect should not change strongly after the adjustment procedure. To try to distinguish 227 the variants with only a direct effect from those with effects at least partly mediated through other 228 traits, we defined the corrected-to-raw ratio (CRR) as the ratio between the corrected effect and 229 the raw uncorrected one. balance of mediated to non-mediated SNP associations varied by foodstuff, ranging from none 240 mediated for tea, spirits and processed meat to all mediated for percentage fat in milk and adding 241 spread to bread (see Table S3). The necessity of using the CRR filtering instead of existing 242 methods is further outlined in additional paragraph 2.7.  Causal inference 256 We proceeded to perform two-sample MR using the food traits as exposures and 78 traits (see 257 table S17 for a list and description) as outcomes (chosen to include those for which diet could be a 258 causal factor, that were in MR-base and for which full GWAS summary statistics were available). 259 As well as using each single food trait as exposures, we also assessed the effect of 16  performed, selecting instrumental variables with or without filtering by CRR or using corrected or 266 uncorrected betas. We considered as the main analysis the CRR-filtered analysis using 267 uncorrected betas and used the others for comparison. Finally we considered as significant the 268 exposure-outcome pairs after multiple test correction of the main analysis using Storey's q-value at q<0.05. Table 1  Looking at the significant MR results, we detected no sign of directional pleiotropy using the MR-296 describes a general healthy-unhealthy diet continuum. All PC1 showed the largest number of 301 associations (15; Fig.S22a), with a healthy value of All PC1 lowering most risk factors linked to 302 obesity and lipid profile (and likely consequently lowering cardiovascular disease risk) and having a 303 positive effect on height and education. With the exception of educational attainment, these results 304 may not be surprising as they broadly overlap with general dietary advice. However, when we 305 decompose these effects into food groups or single foods, we detect differences amongst traits. 306 For example, All PC 1 leads to very similar effects across different obesity/adiposity measures : 307 body fat % (b=-0.080,p=3.2x10 -4 ), body mass index (b= -0.087,p=8.1x10 -5 ), waist-to-hip ratio ( =-308 0.104, p=2.4x10 -6 ) and BMI-adjusted waist-to-hip ratio (b=-0.078,p=2.9x10 -4 ). Figure S23 shows 309 the comparative effects of each food on the four obesity measures: generally, the individual foods 310 affect all four in very similar ways showing that the estimates are stable regardless of the outcome. 311 However, there are some exceptions, for example, both Fresh Fruit and Oily Fish affect Body Fat 312 and both waist:hip ratio measures but not BMI, suggesting that their effect is specifically on 313 adiposity and not body size. 314 315 As a whole, alcohol does not seem to impact any of the four obesity traits, with a very small effect 316 on waist-to-hip ratios. However, looking at each alcoholic beverage individually, beer has a 317 substantial and specific effect on BMI not seen for the other alcoholic beverages, suggesting that 318 this effect is independent of alcohol content. 319 320 Another notable result is the association of oily fish consumption with height (b= 0.2, p=1.76x10 -8 ) 321 ( Fig S22c). It is unclear, however, if this is the result of general healthy eating or if it is the effect of 322 a specific food. In particular if we look at the effects of All PC1-3, we see that a height-raising of 323 PC1 (higher healthy foods, less alcohol/coffee and meat b= 0.09, p=1.35x10 -4 ), a height-lowering 324 effect PC2 (lower healthy food and meat and higher alcohol/coffee b= -0.1, p=1.34x10 -3 ), but no 325 effect of PC3 ( higher meat and less alcohol/coffee and healthy foods b=-0.02, p=0.65) suggesting that the effect on height is lead by healthy foods and alcohol/coffee but independent of meat. 327 Looking at the associations of Healthy PC1-3, we see association only with the first which 328 represents the overall consumption of fish, fruit and vegetables. Finally, comparing these three we 329 find that both higher consumption of vegetables and fish are associated with being taller, with 330 similar effect sizes (Fish PC1, b=0.17, p=4.99x10 -4 and Vegetables PC1, b=0.15, p=1.30x10 -3 ), 331 while fruit has no effect (b= 0, p= 0.96), which makes the effects of fish and vegetables 332 Our results emphasise how complicated relationships among dietary traits are. We have clearly 344 shown that the causal path between food and health is not unidirectional and that in fact genes 345 may affect food behaviours in many different and unexpected ways. Understanding the origins of 346 these effects is fundamental not only for prioritizing loci for functional follow up, but also for 347 understanding why genetic correlations and GWAS results change when different datasets or 348 populations are used. In fact, given that many of the effects we see are likely due to confounding, if 349 the health advice in different populations changes this could alter the architecture of the studied 350 trait and thus the GWAS results, which would appear as allelic heterogeneity. 351 It is unclear whether these effects are limited to dietary phenotypes or if they extend to other traits 352 and further studies are needed to resolve this issue. Recent similar studies 10,11 on the genetic bases of dietary patterns reported having detected no reverse causality. We believe that this 354 difference is due to our novel approach, which is not based on using the potential confounders as 355 covariates, but rather exploits MR, which should be able to distinguish the forward and reverse 356 effects when the causal relationship is bidirectional. Nevertheless, extreme care is required when 357 claiming causal relationships between food and health as the level and complexity of the biases 358 and confounding is so high that it affects even MR, which is known to be more robust than other 359 approaches to these types of effects. If we look at which foods have the greatest effect on triglycerides, it is fruit, vegetables and fish; all 371 with lowering effects (Fig S22f), not sources of carbohydrates or alcohol, known drivers of de novo 372 lipogenesis. This seems to be confirmed by looking at the results with the overall PC traits (All-373 triglycerides. This has been implied in previous studies including the European study on lactase 384 persistence gene 9 . There, while the MR relating lactase-persistence gene to diabetes incidence 385 supported no causal evidence of milk consumption, the secondary analyses identified the lactase-386 persistence variant would relate to consumption of potatoes, poultry, and cereals. These pieces of 387 genetic evidence highlight the importance of a dietary pattern rather than single foods or nutrients. 388 Any health claim from observational studies regarding one or the other should always take into 389 account these facts. For further details of specific results, our online app allows exploration of 390 hypotheses. 391

392
Our study was limited by the number of items available in the dietary questionnaire in the UK 393 BioBank and thus has not explored the full extent of human nutrition, unfortunately apart from 394 bread consumption no carbohydrate or sugar sources were measured, limiting our ability to 395 explore these macronutrients and thus capture the overall diet. Nonetheless, this limitation is 396 unlikely to turn over the abovementioned cautionary interpretation of the dietary MR results. 397 Another important limitation is that effect sizes could be inflated because of the underestimation of 398 the SNP effects on the food traits which will increase MR estimate effects. This under-estimation is 399 due to the noise in the questionnaire responses, which warrant further statistical investigations. Of 400 note, as we have no rationale to consider non-random measurement error, it is unlikely to hinder 401 the detection of a causal effect or its direction, but further studies are needed to assess the precise 402 effect sizes. Before translation of our findings into policy, more studies using different 403 methodologies will be required. 404

405
In conclusion, we have developed an important framework and new tools to help illuminate the 406 effects of nutrition on health and have shown that despite the existing belief that certain dietary 407 assessment provides low-quality data, it is still possible to extract useful information using our 408 methods. It will be interesting to learn to what degree the confounding of food choice reporting by