This study investigates whether quantitative breast density (BD) serves as an imaging biomarker for more intensive breast cancer screening by predicting interval, and node-positive cancers.
This case–control study of 1204 women aged 47–73 includes 599 cancer cases (302 screen-detected, 297 interval; 239 node-positive, 360 node-negative) and 605 controls. Automated BD software calculated fibroglandular volume (FGV), volumetric breast density (VBD) and density grade (DG). A radiologist assessed BD using a visual analogue scale (VAS) from 0 to 100. Logistic regression and area under the receiver operating characteristic curves (AUC) determined whether BD could predict mode of detection (screen-detected or interval); node-negative cancers; node-positive cancers, and all cancers vs. controls.
FGV, VBD, VAS, and DG all discriminated interval cancers (all
FGV, VBD, VAS and DG discriminate interval cancers from controls, reflecting some masking risk. Only FGV discriminates screen-detected cancers perhaps adding a unique component of breast cancer risk.
The aim of stratified, or risk-based, breast cancer screening [
Breast density (BD) reflects the amount of glandular and fibrous connective tissue compared with the amount of fatty tissue in the breasts, as seen on a mammogram. BD has three attributes that support use in stratification of population screening. First, increased BD, conditional on age and body mass index (BMI), is a strong risk factor for breast cancer [
To fill this gap in the literature, we compared women diagnosed with cancer (interval, node-positive, and screen-detected) to disease-free women with respect to BD. We measured BD using automated BD assessments and radiologists’ quantitative visual BD assessments to compare the predictive ability of each BD assessment method. We hypothesised that quantitative BD can predict interval cancers and node-positive screen-detected cancers in order to serve as an imaging biomarker with the potential to personalise breast cancer screening.
Ethical approval for the establishment and use of the OPTIMAM image database [
In the National Health Breast Screening Programme (NHSBSP), women aged 50–70 years are invited for screening every three years, with an age extension being piloted in a randomised controlled trial of women 47–73 years conducted from 2009 to 2022 [
The images were acquired on five Hologic Selenia systems, two Hologic Selenia Dimensions systems (Hologic Inc., Bedford, USA), one GE Senographe Essential system (GE Healthcare Inc., Chicago, USA) and one Sectra MDM-L30 (Phillips Healthcare, Cambridge, Massachusetts, USA). All the digital mammograms in the study were de-identified. Both unprocessed and processed images were collected, when available. To be included in the study, women needed at least one negative digital mammogram prior to the screening mammogram that detected their cancer or the diagnostic mammogram that diagnosed their interval cancer. For the screen-detected cancers, the prior mammogram was used in the study in order to provide an assessment, whether by the radiologist or quantitative imaging, that was ‘blind’ to the cancer. Selection of controls for each case followed a prescribed protocol. Cancer free controls were selected based on the same equipment and ‘date of acquisition’ as the cases. For screen-detected cases, ‘date of acquisition’ was the date of screening examination at which time the cancer is detected. For interval cancers, there were no screening images for detection of cancer (by definition), so ‘date of acquisition’ was date of prior screening images for that individual. From the group of controls meeting these requirements for each case (machine and ‘date of acquisition’), the closest available age was selected. This resulted in 99.4% of cases and controls being within 4 years of age. Because of the limited normal cases in the OPTIMAM database at the time of case/control selection, a one-to-one match protocol was not possible for all. In total, 542 cases had matched controls and 57 cases did not. Thus 63 unmatched controls were included. Matching on other characteristics (e.g. ethnicity or BMI) was not possible because such variables were not available. All the controls were followed up and remained cancer free for at least 3 years. Pathological data were collected from England’s National Breast Screening System.
We required adequate statistical power for comparison of controls with two specific subgroups of cases: interval cancers and node-positive cancers. For both these case groups, we posited that ~20% of controls and 30% of cases would be in the highest density category. Estimating that the total number of controls would be at least double the number of cases in either of these subgroups, 291 cases would give 90% power and 216 cases would give 80% power. We, therefore, aimed to have at least 216 cases in each subgroup. Anticipating that, for some cases and controls, the unprocessed mammograms might not be available, we obtained 599 cases in total, comprising 302 screen-detected cancers and 297 interval cancers. We sought to enrich the dataset for node-positive cases, so all available node-positive cases (
Automated BD software (Volpara Health Technologies Ltd: Version 1.5.1, New Zealand) was used to calculate fibroglandular volume (FGV) in cm3, volumetric breast density (VBD) in percent and 5th Edition Volpara Density Grade (DG) from the unprocessed images on the exam level. Volpara is a FDA-approved fully automated software to estimate volumetric breast density [
In addition, VBDmax is calculated as the denser VBD of the left or right breasts. Volpara software uses preset cut-off points of VBDmax (to mimic BI-RADS 5th Edition) and reports a study-level 5th Edition Volpara Density Grade (DG), where DG a: 0 ≤ VBD < 3.5%, DG b: 3.5% ≤ VBD < 7.5%, DG c: 7.5 ≤ VBD < 15.5%, DG d: VBD ≥ 15.5%). Typically, the Volpara Density Grades are denoted as VDG a/b/c/d. However, to avoid confusion between acronyms that designate ‘V’ as ‘volume’ or ‘volumetric’ the acronym DG is used throughout this paper, rather than VDG. Volpara software has been validated [
A radiologist (ESB), blinded to case–control status, was shown the images using MedXViewer [
We took the continuous variables (FGV, VBD and VAS) and determined categorical quartiles using thresholds determined by the distribution for all cases and controls combined (excluding those missing raw images). DG is a categorical variable, already divided by the Volpara software into categories with pre-determined thresholds. We then estimated how these four categorical measures of BD (FGV-quartile, VBD-quartile, VAS-quartile and DG) and how three continuous BD measures (FGV, VBD and VAS) discriminated between cases and controls. We estimated the effects of these BD variables on risk of cancer overall and on the risk of particular subsets of cancers (node-positive, node-negative, interval, and screen-detected) using logistic regression, adjusting for age. For each subgroup of cases, we used all controls as the comparator group.
In addition, we carried out receiver operating characteristic (ROC) analysis, by estimating and comparing areas under the ROC curve (AUCs). We used the De Long et al. [
Our study included 1204 subjects (599 cancers, 605 controls) in women aged 47–73 years old. Dates of mammograms included in this study ranged from 2010 to 2015 (Table Description of the study population and cancer cases. Control Screen-detected Interval # (%) # (%) # (%) Mammograms Age 47–49 37 6.1 14 4.6 22 7.4 50–54 122 20.2 63 20.9 68 22.9 55–59 126 20.8 62 20.5 62 20.9 60–64 154 25.5 79 26.2 63 21.2 65–69 138 22.8 76 25.2 64 21.5 70–73 28 4.6 8 2.6 18 6.1 Date of ‘prior’ mammogram 2010 27 4.5 12 4 17 5.7 2011 84 13.9 46 15.2 41 13.8 2012 292 48.3 204 67.5 81 27.3 2013 138 22.8 38 12.6 101 34 2014 51 8.4 2 0.7 45 15.2 2015 13 2.1 0 0 12 4 Machine Hologic 593 98.0 296 98.0 288 97.0 GE 9 1.5 6 2.0 9 3.0 Sectra 3 0.5 0 0 0 0 Invasive/In situ Invasive 245 81.1 279 93.9 In situ 57 18.9 18 6.1 Nodal status Positive 116 38.4 123 41.4 Negative 186 61.6 174 58.6 Number of nodes positive None 186 61.6 174 58.6 1, 2 or 3 101 33.4 89 30 4 or more 15 5 34 11.4
Unprocessed images needed for automated BD measures were available for 429 (72%) cases and 418 (69%) controls. FGV-quartile, VAS-quartile, and DG predicted all cancers versus controls, while VBD-quartile did not (Table Association of categorical measures of density with cancer risk (all cancers). Controls All cancers # % # % OR 95% CI FGV (cm3)a 1st quartile 137 22.6 75 12.5 1 2nd quartile 114 18.8 98 16.4 1.6 (1.1, 2.3) 3rd quartile 95 15.7 116 19.4 2.3 (1.5, 3.4) 4th quartile 72 11.9 140 23.4 3.7 (2.5, 5.6) Missing 187 30.9 170 28.4 VBD (%)b 1st quartile 119 19.7 100 16.7 1 2nd quartile 107 17.7 99 16.5 1.1 (0.7, 1.6) 3rd quartile 101 16.7 111 18.5 1.3 (0.9, 1.9) 4th quartile 91 15.0 119 19.9 1.6 (1.0, 2.3) Missing 187 30.9 170 28.4 VAS (%)c 1st quartile 174 28.8 143 23.9 1 2nd quartile 157 26 137 22.9 1.1 (0.8, 1.5) 3rd quartile 132 21.8 165 27.5 1.5 (1.1, 2.1) 4th quartile 142 23.5 154 25.7 1.3 (0.9, 1.8) DG 1 27 4.5 14 2.3 1 2 206 34.0 193 32.2 1.7 (0.9, 3.5) 3 135 22.3 151 25.2 2.1 (1.0, 4.2) 4 50 8.3 71 11.9 2.6 (1.3, 5.7) Missing 187 30.9 170 28.4 Quartile cut-points. aFGV: 11.70, 37.95, 51.30, 73.35, 306.50. bVBD: 2.4, 4.8, 6.9, 10.9, 30.0. cVAS: 1.9, 29.0, 47.0, 64.0, 96.1.
VAS-quartile was not associated with node-positive cancers. In contrast, all categorical automated BD predicted interval cancers and ‘node-positive or interval’ cancers (henceforth referred to as ‘combined’ cancers) with statistical significance (Table Association of categorical measures of density with risk of interval, node-positive, and combined (interval or node-positive). Controls Interval cancers Node-positive cancers Combined # % # % OR 95% CI # % OR 95% CI # % OR 95% CI FGV (cm3) 1st quartile 137 22.6 26 8.8 1 17 7.1 1 34 8.2 1 2nd quartile 114 18.8 56 18.9 2.6 (1.5, 4.4) 34 14.2 2.4 (1.3, 4.6) 66 16.0 2.3 (1.4, 3.8) 3rd quartile 95 15.7 55 18.5 3 (1.8, 5.3) 31 13.0 2.6 (1.4, 5.1) 63 15.3 2.7 (1.6, 4.4) 4th quartile 72 11.9 72 24.2 5.3 (3.1, 9.1) 42 17.6 4.7 (2.5, 9.0) 84 20.3 4.7 (2.9, 7.8) Missinga 187 30.9 88 29.6 115 48.1 166 40.2 VBD (%) 1st quartile 119 19.7 35 11.8 1 24 10.0 1 46 11.1 1 2nd quartile 107 17.7 37 12.5 1.2 (0.7, 2.0) 27 11.3 1.2 (0.7, 2.3) 47 11.4 1.1 (0.7, 1.8) 3rd quartile 101 16.7 65 21.9 2.1 (1.3, 3.5) 29 12.1 1.4 (0.8, 2.6) 73 17.7 1.8 (1.1, 2.9) 4th quartile 91 15.0 72 24.2 2.7 (1.6, 4.4) 44 18.4 2.3 (1.3, 4.2) 81 19.6 2.3 (1.4, 3.6) Missinga 187 30.9 88 29.6 115 48.1 166 40.2 VAS (%) 1st quartile 174 28.8 53 17.8 1 55 23 1 86 20.8 1 2nd quartile 157 25.9 59 19.9 1.2 (0.8, 1.9) 55 23 1.1 (0.7, 1.7) 89 21.6 1.1 (0.8, 1.6) 3rd quartile 132 21.8 85 28.6 2.1 (1.4, 3.2) 63 26.4 1.5 (1.0, 2.3) 116 28.1 1.8 (1.2, 2.5) 4th quartile 142 23.5 100 33.7 2.2 (1.5, 3.4) 66 27.6 1.5 (1.0, 2.3) 122 29.5 1.7 (1.2, 2.4) DG 1 27 4.5 5 1.7 1 4 1.7 1 6 1.5 1 2 206 34.0 70 23.6 1.7 (0.7, 5.3) 48 20.1 1.5 (0.6, 5.4) 91 22.0 1.9 (0.8, 5.2) 3 135 22.3 88 29.6 3.3 (1.3, 10.1) 44 18.4 2.1 (0.8, 7.5) 99 24.0 3.1 (1.3, 8.6) 4 50 8.3 46 15.5 4.7 (1.8, 15.0) 28 11.7 3.6 (1.2, 13.1) 51 12.3 4.4 (1.7, 12.6) Missinga 187 30.9 88 29.6 115 48.1 166 40.2 aMissing applies to all density measures except VAS. Associations between categorical mammographic measures of breast density and breast cancer risk are described by odds ratios for all cancers, screen-detected and interval cancers as compared to controls.
For continuous BD measures (FGV, VBD and VAS), the differences in means between cases and controls were statistically significant for all, interval, node-positive, and combined cancers (Table Associations of all cancers, screen-detected, interval, node-negative, node-positive and combined (node-positive or interval) cancers with continuous breast density measures. Mean Mean Difference CI AUC 95% CIa Controls All cancers FGV (cm3) 53.7 66.3 12.6 (8.1, 17.1) 0.63 (0.59, 0.67) VBD (%) 8.2 9.2 1.0 (0.3, 1.7) 0.56 (0.51, 0.60) VAS (%) 44.4 48.2 3.8 (1.3, 6.4) 0.55 (0.51, 0.59) Controls Screen-detected FGV (cm3) 53.7 64.4 10.8 (5.0, 16.5) 0.61 (0.56, 0.66) VBD (%) 8.2 8.1 −0.1 (−0.9, 0.8) 0.51 (0.46, 0.56) VAS (%) 44.4 44.1 −0.3 (−3.3, 2.7) 0.50 (0.46, 0.55) Controls Interval cancers FGV (cm3) 53.7 68.2 14.5 (8.9, 20.1) 0.65 (0.60, 0.70) VBD (%) 8.2 10.3 2.1 (1.2, 3.0) 0.63 (0.58, 0.68) VAS (%) 44.4 52.4 8.1 (5.0, 11.1) 0.60 (0.56, 0.65) Controls Node-negative cancers FGV (cm3) 53.7 64.1 10.4 (5.8, 15.0) 0.62 (0.58, 0.67) VBD (%) 8.2 8.8 0.6 (−0.2, 1.4) 0.54 (0.49, 0.59) VAS (%) 44.4 47.5 3.1 (0.2, 6.0) 0.54 (0.50, 0.58) Controls Node-positive cancers FGV (cm3) 53.7 71.7 18.0 (9.5, 26.4) 0.65 (0.59, 0.71) VBD (%) 8.2 10.1 1.9 (0.7, 3.1) 0.60 (0.54, 0.66) VAS (%) 44.4 49.3 4.9 (1.6, 8.2) 0.56 (0.51, 0.61) Controls Combined FGV (cm3) 53.7 69.2 15.5 (9.8, 21.2) 0.65 (0.60, 0.69) VBD (%) 8.2 10.0 1.8 (0.9, 2.7) 0.61 (0.56, 0.66) VAS (%) 44.4 50.5 6.2 (3.4, 9.0) 0.58 (0.54, 0.62) a95% Confidence intervals that do not include 0.50 demonstrate a statistically significantly better discriminatory ability compared to chance. bThis
AUC analysis (Fig. Receiver operating characteristic (ROC) curves for continuous mammographic measures of breast density to discriminate
There were clear differences between the AUCs of the three BD measures for all (
To provide a metric that may be more clinically relevant than AUC, we determined the numbers of each ‘type’ of cancer by risk quartile: the lowest risk 25% (1st quartile) and the highest risk 25% (4th quartile). Results showing the highest risk 25% (4th quartile) for all subcategories of cancers including screen-detected, interval, node-positive, and node-negative demonstrate that FGV captures at least as high a percentage of these cancers as VBD and VAS (Table The numbers and percentage of each type of cancer by quantitative density risk quartile. Density measure Category Controls Screen-detected cancers Interval cancers Node + Cancers(%) Node − Cancers(%) FGV 1st quartile 137 (32.8) 49 (22.3) 26 (12.4) 17 (13.7) 58 (19) 2nd & 3rd quartile 209 (50) 103 (46.8) 111 (53.1) 65 (52.4) 149 (48.9) VBD 1st quartile 119 (28.5) 65 (29.5) 35 (16.7) 24 (19.4) 76 (24.9) 2nd & 3rd quartile 208 (49.8) 108 (49.1) 102 (48.8) 56 (45.2) 154 (50.5) VAS 1st quartile 174 (28.8) 90 (29.8) 53 (17.8) 55 (23) 88 (24.4) 2nd & 3rd quartile 289 (47.8) 158 (52.3) 144 (48.5) 118 (49.4) 184 (51.1) The highest risk women, 4th quartile is bolded.
FGV, VBD and VAS were all significantly more discriminative of interval cancers than of screen-detected cancers (
FGV significantly discriminated all, interval, screen-detected, node-positive and node-negative cancers compared to controls. VBD, VAS and DG discriminated interval or node-positive cancers but did not consistently discriminate screen-detected or node-negative cancers. The relative discriminative ability of FGV, overall and for each/individual cancer subtypes/groups was either equivalent to or, in most cases, greater than that of VAS or VBD, whether using logistic regression (captured by the steepness of the odds ratio gradient), ROC analysis (captured by AUC), or number of cancers included in the highest risk category (4th quartile). Of note, for VBD and VAS, interval cancer prediction was significantly greater (by AUC) than screen-detected cancer prediction while FGV only showed a statistical trend. This phenomenon underscores the differential ability of FGV to discriminate screen-detected cancers, knowing that FGV has generally higher AUCs for virtually all comparisons (Table
If quantitative breast density is to be successfully used for stratified screening protocols to decrease interval and advanced breast cancers, prediction of both the risk of breast cancer and the risk of masking by mammographic breast density will be important. It stands to reason that screen-detected cancers are less affected by masking because they were detected on mammography and, thus, not sufficiently obscured by dense fibroglandular tissue to preclude detection. On the other hand, interval cancers are likely to be more affected by masking because they were not detected by mammography. However, this relationship between interval cancers and masking is far from perfect because interval cancers may also be related to rapid growth between screening examinations or to an interpretation error. Therefore, screen-detected cancers may map more strongly to breast cancer risk as compared to masking. Correspondingly, interval cancers may map more strongly to masking but also involve a component of breast cancer risk. In our study, because VBD and VAS only discriminate interval or node-positive cancers from controls, these algorithms may correlate more strongly with masking. On the other hand, FGV, which additionally discriminates screen-detected cancers from controls may have an added correlation to breast cancer risk. Perhaps FGV maps to both breast cancer and masking risk by measuring absolute BD volume as compared to VBD and VAS, which measure percent BD. There is a precedent for stronger prediction of breast cancer risk generally from absolute rather than percentage density measures [
Our results are comparable to results of the single study that analysed interval cancers in a screening programme with a long screening interval (3 years) and tested several quantitative BD techniques [
Whatever the mechanism for measuring BD, women with high levels of BD have an increased risk of interval or node-positive cancers, motivating the need to augment the screening regimen. Women at high breast cancer risk but not at high masking risk, may benefit from increased mammography screening frequency. Women at high masking risk only or high cancer and masking risk, may be better served by screening with modalities supplementary to mammography, like MRI or ultrasound. In fact, there is interest in determining and targeting these different opportunities for improved screening outcomes (masking versus breast cancer risk) and modelling these strategies [
The strengths of our study include our assessment of the discriminative ability of several measures of BD and risk of breast cancer. We also provide an important analysis of volumetric BD related to interval cancer risk [
We did not collect detailed information in relation to a number of covariates (demographic, hormonal, reproductive, lifestyle and family history). We also did not have BMI, which is known to improve discriminatory capacity of quantitative BD measurements [
We find that FGV has the potential to predict the important components of risk that may provide the foundation for stratified screening: risk of cancer, risk of aggressive cancer, and risk of masking effects. While any quantitative BD measure will undoubtedly be one variable among many predictive variables that will contribute to decisions about breast cancer screening, we believe that our analysis adds to the literature that will inform a more comprehensive model to be tested in the future. Our findings suggest that FGV may be a comparatively better imaging biomarker suited to provide guidance for more intensive stratified screening for mammography, such as a shortened screening interval. VBD, VAS and DG, by predominantly predicting interval cancers and node-positive cancers may selectively correlate with masking risk and be more suited to directing women to supplemental screening modalities other than mammography.
The authors thank Volpara Solutions (Volpara Health Technologies Ltd., New Zealand) for providing Volpara Data Manager software (Version 1.5.1) for this work. This study was performed in accordance with the Declaration of Helsinki.
Elizabeth Burnside: Conceptualisation, formal analysis, investigation, methodology, project administration, resources, supervision, visualisation, writing—original draft, writing—review and editing. Lucy Warren: Data curation, formal analysis, methodology, software, validation, writing—original draft. Jonathan Myles: Formal analysis, methodology, software, validation. Louise Wilkinson: Conceptualisation, data curation, investigation, resources, visualisation, writing—review and editing. Kenneth Young: Conceptualisation, data curation, funding acquisition, investigation, resources, software, supervision, visualisation, writing—review and editing. Robert Smith: Conceptualisation, funding acquisition, resources, visualisation, writing—review and editing. Nathalie Massat: Conceptualisation, funding acquisition, visualisation, writing—review and editing. Matthew Wallis: Conceptualisation, visualisation, writing—review and editing. Mishal Patel: Conceptualisation, data curation, software, visualisation, writing—review and editing. Stephen Duffy: Conceptualisation, formal analysis, funding acquisition, investigation, methodology, project administration, resources, software, supervision, visualisation, writing—original draft, writing—review and editing.
The fieldwork for this study was funded by the American Cancer Society (Grant Reference NHPDCSGBR-GBRLONG) and the National Institutes of Health (K24CA194251). This work was part funded by Cancer Research UK, as part of the OPTIMAM2 research programme (Grant Reference C30682/A17321). Stephen Duffy, Nathalie Massat and Jonathan Myles contributed to this work as part of the programme of the Policy Research Unit in Cancer Awareness, Screening and early Diagnosis, PR-PRU-1217-21601, which is funded by the National Institute for Health Research (NIHR) Policy Research Programme. Dr. Wallis was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the author(s) and not necessarily those of the NIHR, the Department of Health and Social Care, its arm’s length bodies, or other Government Departments. The sponsors of this research did not have a role in the study design, data collection, analysis, the interpretation of the data, or writing of this manuscript. Mishal Patel, is an employee of AstraZeneca. The views expressed are those of the authors and not necessarily those of the AstraZeneca.
Mammographic screening images and associated pathological data that were collected as part of the research image database called the OPTIMAM Mammography Image Database cited in the text of the manuscript methods section. The OPTIMAM Mammography Image Database, funded by Cancer Research UK, used in the current study are available and can be found here
Ethical approval for the establishment and use of the OPTIMAM image database was obtained from the NHS National Research Ethics Service. This manuscript does not contain any individual person’s data in any form (individual details, images or videos), therefore, no written consent for publication was necessary.
The authors declare no competing interests.
Not applicable.
Supplementrary Information
The online version contains supplementary material available at