Statistical models for estimating the intake of nutrients and foods from complex survey data
Background: The consequences of poor nutrition are well known and of wide concern. Governments and public health agencies utilise food and diet surveillance data to make decisions that lead to improvements in nutrition. These surveys often utilise complex sample designs for efficient data collection. There are several challenges in the statistical analysis of dietary intake data collected using complex survey designs, which have not been fully addressed by current methods. Firstly, the shape of the distribution of intake can be highly skewed due to the presence of outlier observations and a large proportion of zero observations arising from the inability of the food diary to capture consumption within the period of observation. Secondly, dietary data is subject to variability arising from day-to-day individual variation in food consumption and measurement error, to be accounted for in the estimation procedure for correct inferences. Thirdly, the complex sample design needs to be incorporated into the estimation procedure to allow extrapolation of results into the target population. This thesis aims to develop novel statistical methods to address these challenges, applied to the analysis of iron intake data from the UK National Diet and Nutrition Survey Rolling Programme (NDNS RP) and UK national prescription data of iron deficiency medication.
Methods: 1) To assess the nutritional status of particular population groups a two-part model with a generalised gamma (GG) distribution was developed for intakes that show high frequencies of zero observations. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. 2) To identify population groups at risk of low nutrient intakes, a linear quantile mixed-effects model was developed to model quantiles of the distribution of intake as a function of explanatory variables. The proposed approach was illustrated by comparing the quantiles of iron intake with Lower Reference Nutrient Intakes (LRNI) recommendations using NDNS RP.
This thesis extended the estimation procedures of both the two-part model with GG distribution and the linear quantile mixed-effects model to incorporate the complex sample design in three steps: the likelihood function was multiplied by the sample weightings; bootstrap methods for the estimation of the variance and finally, the variance estimation of the model parameters was stratified by the survey strata.
- To evaluate the allocation of resources to alleviate nutritional deficiencies, a quantile linear mixed-effects model was used to analyse the distribution of expenditure on iron deficiency medication across health boards in the UK. Expenditure is likely to depend on the iron status of the region; therefore, for a fair comparison among health boards, iron status was estimated using the method developed in objective 2) and used in the specification of the median amount spent. Each health board is formed by a set of general practices (GPs), therefore, a random intercept was used to induce correlation between expenditure from two GPs from the same health board.
Finally, the approaches in objectives 1) and 2) were compared with the traditional approach based on weighted linear regression modelling used in the NDNS RP reports. All analyses were implemented using SAS and R.
Results: The two-part model with GG distribution fitted to amount of iron consumed from selected episodically food, showed that females tended to have greater odds of consuming iron from foods but consumed smaller amounts. As age groups increased, consumption tended to increase relative to the reference group though odds of consumption varied. Iron consumption also appeared to be dependent on National Statistics Socio-Economic Classification (NSSEC) group with lower social groups consuming less, in general. The quantiles of iron intake estimated using the linear quantile mixed-effects model showed that more than 25% of females aged 11-50y are below the LRNI, and that 11-18y girls are the group at highest of deficiency in the UK. Predictions of spending on iron medication in the UK based on the linear quantile mixed-effects model showed areas of higher iron intake resulted in lower spending on treating iron deficiency. In a geographical display of expenditure, Northern Ireland featured the lowest amount spent. Comparing the results from the methods proposed here showed that using the traditional approach based on weighted regression analysis could result in spurious associations.
Discussion: This thesis developed novel approaches to the analysis of dietary complex survey data to address three important objectives of diet surveillance, namely the mean estimation of food intake by population groups, identification of groups at high risk of nutrient deficiency and allocation of resources to alleviate nutrient deficiencies. The methods provided models of good fit to dietary data, accounted for the sources of data variability and extended the estimation procedures to incorporate the complex sample survey design. The use of a GG distribution for modelling intake is an important improvement over existing methods, as it includes many distributions with different shapes and its domain takes non-negative values. The two-part model accommodated the sources of data variation of dietary intake with a random intercept in each component, which could be correlated to allow a correlation between the probability of consuming and the amount consumed. This also improves existing approaches that assume a zero correlation. The linear quantile mixed-effects model utilises the asymmetric Laplace distribution which can also accommodate many different distributional shapes, and likelihood-based estimation is robust to model misspecification. This method is an important improvement over existing methods used in nutritional research as it explicitly models the quantiles in terms of explanatory variables using a novel quantile regression model with random effects. The application of these models to UK national data confirmed the association of poorer diets and lower social class, identified the group of 11-50y females as a group at high risk of iron deficiency, and highlighted Northern Ireland as the region with the lowest expenditure on iron prescriptions.