The type and genomic context of cancer mutations depend on their causes. These causes have been characterized using signatures that represent mutation types that co-occur in the same tumours. However, it remains unclear how mutation processes change during cancer evolution due to the lack of reliable methods to reconstruct evolutionary trajectories of mutational signature activity. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types, we present TrackSig, a new method that reconstructs these trajectories using optimal, joint segmentation and deconvolution of mutation type and allele frequencies from a single tumour sample. In simulations, we find TrackSig has a 3–5% activity reconstruction error, and 12% false detection rate. It outperforms an aggressive baseline in situations with branching evolution, CNA gain, and neutral mutations. Applied to data from 2658 tumours and 38 cancer types, TrackSig permits pan-cancer insight into evolutionary changes in mutational processes.
Cancers evolve as they progress under differing selective pressures. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, the authors present the method TrackSig the estimates evolutionary trajectories of somatic mutational processes from single bulk tumour data.
PCAWG Evolution and Heterogeneity Working Group authors and their affiliations appears at the end of the paper.
PCAWG Consortium members and their affiliations appear in the Supplementary Information.
Somatic mutations accumulate throughout our lifetime, arising from external sources or from processes intrinsic to the cell
One can estimate the contribution of different mutation processes to the collection of somatic mutations present in a sample through mutational signature analysis. In this type of analysis, single nucleotide variants (SNVs) are classified into 96 types based on the type of substitution and tri-nucleotide context (e.g. ACG–ATG)
Formally, a “mutational signature” is a probability distribution over a categorical variable representing a mutation type, where each element is a probability of generating a mutation from the corresponding type
Mutational sources can change over time
Here we introduce TrackSig, a new method to reconstruct signature activities across time without VAF clustering. We use VAF to approximately order mutations based on their prevalence within the cancer cell population and then track changes in signature activity that are consistent with this ordering.
We use realistic simulations and bootstrap analysis to help assess the accuracy of signature activity reconstructions under a variety of different evolutionary scenarios. Using TrackSig and Pan-cancer Analysis of Whole Genomes (PCAWG) dataset of 2658 cancers, we have previously demonstrated
The PCAWG Consortium aggregated whole-genome sequencing data from 2658 cancers across 38 tumour types generated by the ICGC and TCGA projects. These sequencing data were re-analysed with standardised, high-accuracy pipelines to align to the human genome (reference build hs37d5) and identify germline variants and somatically acquired mutations, as described by PCAWG Network
In this paper we perform the realistic simulations to evaluate TrackSig’s performance at reconstructing signature activities, detecting the number of mutation clusters, and correctly placing the changepoints under different scenarios including violations of TrackSig’s assumptions.
TrackSig was applied to the 2552 whole-genome sequencing samples with more than 600 SNVs contained within the white and grey lists of the PCAWG group. Here we provide methodological details of TrackSig’s use on real data (PCAWG). The analysis of signature trends and relation of changepoints found by TrackSig to subclonal boundaries is described elsewhere Each plot is constructed from VAF data from a single tumour sample. Each line is an activity trajectory that depicts inferred activities for a single signature (
By default, following Alexandrov et al.
Within the context of PCAWG, we use the set of 48 single-base signatures (SBS) developed by PCAWG-Signature group. The first 30 of those signatures are slightly modified versions of original signatures defined by Alexandrov et al.
We tested sensitivity of TrackSig in multiple error scenarios using simulated data with known ground truth. First, because signatures overlap in the mutation types that they can produce, we first test reconstruction accuracy when SNVs are accurately assigned to the time points to assess errors due to inability to correctly assign signature activity. We describe these non-parametric simulations in the next section.
In “Results” section, we assess reconstructions when the mutation ordering is inferred based on mutation VAF. In this scenario, reconstruction errors can occur when (i) cancer cell fraction (CCF) estimates are inaccurate and, (ii) there are two SNV clusters which overlap in CCF space but have different signature activity profiles. In the latter case, SNVs from both clusters will be located in the same or adjacent time point bins and will have a mixture of signature activity profiles from two clusters. To test reconstruction errors in these two scenarios, we produce clonal evolution simulations where we sample the VAF data from a clonal evolution model with binomial sequencing noise. To simulate the VAF detection limit for mutations imposed by somatic mutation calling, we remove any mutation with fewer than three variant reads.
Finally, as part of clonal evolution simulations, we assess model misspecification error by introducing violations of the assumptions of infinite sites and the relationship between CCF and timing of mutation occurrence. Also, in some simulations, we introduce mutations under neutral selection (i.e. neutrally evolving mutations). We computed the number and VAFs of these mutations using the model and effective mutation rates derived by Williams et al.
In the non-parametric simulations, we test the ideal scenario when SNVs are correctly ordered and assigned to the time point bins. Here we want to access the ability of TrackSig to reconstruct signature activities from the distribution of mutation types and place changepoints at the correct locations.
Each simulation has 50 time points, each time point is a bin of 100 mutations. This corresponds to the average number of somatic mutations detected in PCAWG. Each sample also contains four active signatures. Two of those signatures are 1 and 5, which are nearly always active in the PCAWG samples. For the remaining two signatures, we test all 1035 possible combinations of the other 46 signatures.
We generate simulations with 0–3 changepoints that are placed randomly on the timeline. For each segment on the timeline, we sample signature activities from a uniform distribution over activity vectors. Finally, we sample 100 mutation types per time point from the discrete distribution derived using the sampled activities as mixing coefficients for the four signatures.
Next, we run TrackSig on the simulated data and compare the reconstructed activity trajectories to the ground truth. We remove changepoints with small change, that is, where activities of all signatures change by <5% in reconstructed trajectories. This threshold is derived in “Results” section from permutation analysis.
We computed the absolute difference between predicted activities and the ground truth at each time point and take the median across all time points and all four signatures. We called this the median activity difference per simulation. On the simulations with no changepoints, the median of these median per simulation differences is 0.7%. On simulations with 1–3 changepoints, this median increases slightly to 2%. The cumulative distribution of the median per simulation differences is shown in Fig.
For the PCAWG data, we report the maximum activity change (MAC) across activity trajectory
To compare the direction of the activity change, we divide signatures into those with: decreasing activity, increasing activity, and no activity change (i.e. max absolute change is <5%). The direction of maximum change is consistent in 95.2% of all signatures across all simulations.
To compute the number of false positives and false negatives, we count a true positive detection if at least one of predicted changepoints occur within three time points of an actual one. A false negative is when no predicted changepoints are within three time points of an actual change. This criteria is identical to the one we use to evaluate whether a changepoint supports a subclonal boundary
Tables False negatives rates in non-parametric simulations. No. of true changepoints 0 1 2 3 Avg no. of FN per simulation 0.0 0.008 0.038 0.058 No. of false negative changepoints (FN) 0 1 0.992 0.962 0.947 1 0 0.008 0.038 0.049 2 0 0 0 0.003 Each cell shows the proportion of simulations that have certain number of false negatives (normalized within the column). See main text for definition of positive and negative time points. The first row of the table shows the average number of false negatives per simulation False positives rates in non-parametric simulations. No. of true changepoints 0 1 2 3 Avg No. of FP per simulation 0.130 0.128 0.118 0.116 No. of false positive changepoints (FP) 0 0.909 0.896 0.9 0.889 1 0.06 0.083 0.087 0.106 2 0.024 0.019 0.011 0.005 3 0.005 0.002 0.002 0 4 0.002 0 0.001 0 Each cell shows the proportion of simulations that have certain number of false positives (normalized within the column). See main text for definition of positive and negative time points. The first row of the table shows the average number of false positives per simulation
Generating realistic simulated data requires making some assumptions about how tumours evolve. In this section, we simulate VAF data consistent with clonal evolution theory
We also include some simulations that violate the clonal evolution assumptions and test TrackSig’s robustness to these violations. We performed six different simulations, described briefly here (see Supplementary Note
First, we aim to evaluate false negative and false positive rates of identifying subclones via TrackSig. We simulated VAF data from (a) one clonal population and no subclones (b) one clonal population and one subclone with a variety of CCF values sampled from a uniform distribution, assuming a linear clonal tree. We sample the variant allele counts for the mutations in accordance to the cluster CCFs from a binomial distribution. We create simulations with four signatures—age-related signatures 1 and 5 and two other randomly-chosen signatures, which we will refer to as A1 and A2. The activities of A1 and A2 are sampled uniformly in each clone, under the constraint that at least one of them has a signature change of at least 30%. We sample mutation types from the signature mixture treating it as a multinomial distribution. We simulated mean read depths of 10, 30, and 100.
The performance of TrackSig was also assessed under conditions of neutral evolution. We sampled mutation VAFs as per the previous paragraph but we also added some neutrally evolving mutations to the clonal cluster. We determined the number of neutral mutations to add and sampled their VAFs according to the model from the Williams et al.
To assess TrackSig’s accuracy when the timeline does not reflect the ordering of acquistion of SNVs, we generated VAF data from a branched phylogeny. In branching simulation we generated VAF data assuming a branching clonal tree with two subclones. We force the sum of subclonal CCFs to be <1, otherwise the infinite sites assumption will be violated
We next assess TrackSig’s accuracy for reconstructing activity changes when SNV VAFs are affected by a copy number aberration (CNA). We generated VAF data with a clonal CNA gain affecting 10% of the SNV VAFs. In 5% of the mutations the CNA gain is affecting the mutant allele and in 5% the CNA gain is affecting the reference allele. This simulation is created similarly to the branching with three clusters. The difference is that we modify the probability of sampling a mutant allele to consider the altered mutant and reference copy numbers.
Finally, we create simulations with violation of infinite site assumption, where the same mutations independently occurred in two branched subclones. To model this, we set the CCFs of 3% of mutations to be equal to the sum of CCFs from the two subclones.
We compared results of TrackSig to the widely used approach of first clustering mutations by CCF and then inferring signature activities within each cluster
We compared TrackSig and Sciclone + DeconstructSigs pipeline (hereafter SciClone for brevity, see Supplementary Note
We then compared methods based on their ability to detect subclones. Figure
As expected, the SciClone binomial-beta performs nearly perfectly on the depth 30 simulations which match the assumptions of this model (Fig
Supplementary Fig.
We analyze the variation of signature activities on PCAWG data across time and across samples. We compute the maximum change of the signatures in each sample, which is simply the difference between maximum and minimum activity of the signature. To assess whether a signature change is statistically significant, we permute the mutations in each sample and run the trajectory estimation on the permuted set. Since permuted mutations are not sorted in time, we expect no change in the activity trajectories over time. The MAC that we observe on permuted set of mutations does not exceed 5% in any sample. Therefore, we only consider signature changes above 5% to be significant (Fig. The red line shows the threshold of 5%, above which we consider changes to be significant.
We assess the variability in activity trajectories by performing bootstrap on the PCAWG data. We sample mutations with replacement from the original set and re-calculate their activities and changepoints. We perform 30 bootstrap runs for each sample. Figure
Signature trajectories calculated on bootstrap data are stable. The mean standard deviation of activity values calculated at each time point is 2.9%. We also evaluate the consistency of signature changes across the entire activity trajectory: size of signature change and location of the changepoint. The mean standard deviation of the change in signature activity is 5.3% across the bootstraps. This standard deviation does not exceed 5% in 55.8% of samples (does not exceed 10% in 94.3% of samples, Supplementary Fig.
In TrackSig the number of changepoints calculated during activity fitting does vary across bootstrap samples. We observe 1.02 standard deviation in the number of changepoints. To assess the variability in the location of the changepoints, we matched nearby changepoints between bootstrap samples and measured their average distance in CCF. Because the number of changepoints can change between samples, as a reference, we randomly choose one of the samples that has a number of changepoints equal to the median number of changepoints among all samples. Then, in all other bootstrap runs, we match each changepoint to the closest run in the reference. We found that location of the changepoints is consistent across bootstraps: on average, changepoints are located 0.093 CCF apart from the closest reference changepoint.
As shown by Fig.
The majority of PCAWG samples have a signature change: 76.1% of samples have a max change >5% in at least one signature; 48.4% of samples have change >10%. However, the number of signature changes depends on the number of mutations in the sample. Out of samples with >10 time points only 26.3% of samples have a change >5% compared with 80.4% across the rest of the samples (see distribution on Fig. Proportion of tumours that have a significant change greater than 5% activity depending on the number of time points in a sample. Each bar corresponds to the range of number of time points in a sample; each time point contains 100 mutations.
TrackSig reconstructs the evolutionary trajectories of mutational signature activities by sorting point mutations according to their inferred CCF and then partitioning this sorted list into groups of mutations with constant signature activities. TrackSig estimates uncertainty in the location of the changepoints using bootstrap. TrackSig is designed to be applied to VAF data on SNVs from a single sample, however, it can be applied to either sorted lists of point mutations derived from subclonal reconstruction algorithms, or CCFs from a single cancer sample derived from methods which perform multi-sample reconstructions or subclonal CNA reconstructions.
Changepoints often correspond to boundaries between subclones
Previous approaches estimate signature activities for a group of mutations without considering their timing (e.g. eMu
In contrast, TrackSig uses the distributions of mutation types to group mutations, this permits more accurate reconstruction of signature activities than clustering mutations by VAF alone. Indeed, as our simulations demonstrate, not only are the signature activities more accurately reconstructed, but in some cases, TrackSig is a more sensitive detector of subclones. Furthermore, TrackSig makes fewer assumptions about the underlying VAF distribution, so it can be readily applied to data from neutrally evolving tumour populations
Clustering methods applied to VAFs from single bulk samples require high read depth for accuracy
Another important innovation of TrackSig is the use of CCF as a surrogate for evolutionary timing. Similar ideas have been used in human population genetics, where variant allele frequency to get relative order of mutations along the ancestral lineage
In TrackSig, the number of mutation types is provided as a parameter and is not fixed to 96 types. Because of this, it is straightforward to generalize TrackSig to reconstruct the activities of different mutation signatures or different mutations, so long as these mutations can be approximately ordered by their evolutionary time and each mutation can be classified into one of a fixed number of categories. In this paper, we ordered SNVs by decreasing CCF. This same strategy could be naturally extended to indels for which the infinite sites assumption is also valid. The infinite sites assumption should also be valid for structural variants (SVs) associated with well-defined breakpoints, thus permitting TrackSig to be used to track activities to recently defined SV signatures
TrackSig also requires a pre-defined set of mutation signatures, each of which is a probability distribution over the mutation types. However, if these signatures are unavailable, they can be defined by non-negative matrix factorization, or Latent Dirichlet Allocation
TrackSig can be applied to VAFs from bulk sequencing data from multi-region sequencing or longitudinal samples by simply running it on each sample separately. In preliminary experiments testing this approach we found broad consistency in the active signature selected, and in the signature activities of the clonal mutations in each sample. We observe with only 0.03% mean absolute activity difference (0.017 KL divergence) between signature activities of clonal cluster across different samples. See Supplementary Note
For ease of presentation, we have assumed that ordering SNVs by CCF recovers the order in which they accumulated in the genomes of ancestral cells. However, this assumption is not critical for correct reconstruction of signature activity changes.
First, we have shown through bootstrap sampling and the clonal evolution simulations that errors in the estimation of SNV CCFs due to sampling noise have a limited impact on TrackSig’s ability to estimate accurate activity trajectories. We have similarly shown that these activity trajectories are not impacted if a small fraction (3%) of the SNVs violate the infinite sites assumption.
However, these trajectories can be impacted by incorrect ordering of a large numbers of SNVs. These can occur in two ways. First, misordering can occur if a CNA changes the number of SNV allele’s per cell. For example, daughter cells can fail to inherit SNVs in their mother cells due to a loss of heterozygosity (LOH). If a CNA reconstruction is available, TrackSig will correct for any detected clonal LOH when ordering SNVs, and will not attempt to order SNVs in regions affected by subclonal CNAs, thereby resolving this difficulty. However, if a CNA reconstruction is not available, or it is inaccurate, the accuracy of the activity trajectories can suffer. As such, we recommend only using TrackSig when CNA reconstructions are available and reliable.
Second, SNV ordering need not correspond to the time of acquisition when a single sample contains SNVs from subclones from different branches of the cancer phylogeny. In these circumstances, there is not a single linear order for the activities, and furthermore, late occurring subclones on a different branch can have higher CCF than earlier ones occurring in the sample. This situation also occurs when the sample contains a large number of neutrally evolving mutations from multiple subclonal lineages, as seen in the two cluster, depth 100 simulations. Note that such circumstances are rare in single biopsies
Even in the rare circumstance that SNV misordering does occur, it may be possible to detect it, and interpret the activity changes correctly. For example, if late occurring but misordered SNVs manifest a more drastic change in signature activity, this misordering may be detectable by the presence of oscillations in the activity trajectories. To address this issue, when assessing overall change in signature activity, we computed the difference between the lowest and highest activities for each signature. This difference will be consistent regardless of ordering.
The timelines reconstructed by TrackSig are computed with a fixed number of mutations in each bin. If overall rate of generating mutations in tumour was constant, our timeline would correspond to the real time. However, tumour mutation rate often accelerates throughout development
Estimating changes in overall mutation rate is difficult. A possible way to correct for this is to adjust the timeline based on activities of signatures 1 and 5. Some report that signatures 1 and 5 operate as a cellular clock as the number of mutations contributed by these signatures is proportional to the age of the individual
Our method TrackSig provides further insight how signature profile changes throughout tumour development. We show that through signatures analysis we can detect major events in tumour evolution, notably, transitions to a new subclone. Mutational signatures provide a unique way to recover tumour evolution path, track activities of mutational processes, adjust the treatment strategy and detect changes in therapy response.
TrackSig is designed to be applied to VAF frequency data from a single, heterogeneous tumour sample. The method consists of two stages. First, we sort SNVs by their estimated CCF that we estimate using their VAFs and a CNA reconstruction of the samples. Next, we infer a trajectory of the mutational signature activities over the estimated ordering of the SNVs. We estimated activity trajectory for each signature as a piece-wise constant function of the SNV ordering with a small number of changepoints. These stages are described in detail below. Note that TrackSig does not rely on any methods for clustering mutations, such as phylogeny reconstruction.
No single evolutionary model can yet explain all of the observed VAF distributions in bulk tumour samples
In the following sections, we describe how the SNV VAFs are used to create a timeline to which TrackSig is applied. For ease of presentation, we will assume that the time of SNV occurrence increases approximately monotonically with position on the timeline. This interpretation is valid under the infinite sites assumption and either a neutral evolution model or a clonal one when all subclones are from the same branch, as is often the case in single samples
Estimating a SNV’s CCF requires both an estimate of its VAF and an estimate of the average number of mutant and reference alleles per cell at the locus where the SNV occurs. In TrackSig, we derive this estimate from a CNA reconstruction provided with the VAF inputs.
To account for uncertainty in a SNV’s VAF due to the finite sampling, we model the posterior distribution over its VAF using a Beta distribution:
If no CNA reconstruction is available, TrackSig assumes that each SNV is in a region of normal copy number and TrackSig estimates CCFs in autosomal regions by setting:
If a CNA reconstruction is available, TrackSig uses it when converting from VAF to CCF. TrackSig assumes there is a maximum of one copy of the variant allele per cell, and thus estimates CCF by setting:
In regions of subclonal CNAs, estimating CCF requires a phylogenetic reconstruction in order to determine whether the subclonal CNA influences the number of variant alleles in the affected cells
TrackSig sorts SNVs in order of decreasing estimated CCF and uses the rank of the SNV in this list as a "pseudo-time” estimate of its time of appearance. Note that this estimate will have a non-linear relationship to real time, if the overall mutation rate can vary during the tumour’s development. If some of the SNVs can be interpreted as clock mutations, an SNV’s rank can be converted into an estimate of real time
To derive an estimate of the activity trajectory, TrackSig converts the SNV ordering into a set of time points with non-overlapping subsets of the SNVs. We do this for two reasons. First, stable estimation of signature activities requires a minimum number of mutations. By binning mutations into time points and requiring a minimum number of time points per segment, TrackSig enforces a minimum of 100 mutations per segment. Also, the time complexity of TrackSig scales with the number of time points. So by binning mutations, we can speed up TrackSig. By default, we set the bin size to 100 but the user can change this setting to as low as 1. As we show in “Results” section, TrackSig’s signature activity reconstructions are relatively insensitive to the choice of bin size.
TrackSig first partitions the ordered mutations into bins and interprets each bin as one time point. The “timeline” of the cancer is the collection of the time points. TrackSig reports signature activity trajectories as a function of points in the timeline. We emphasize that TrackSig does not use any information about subclones when partitioning the SNVs and that TrackSig only uses CCFs for the SNV from a single sample.
To estimate activity trajectories, TrackSig partitions the timelines into segments containing one or more time points. Within each of these segments, it estimates signature activities using mixture of discrete distributions. Full details of the model are provided in the Supplementary Note
TrackSig identifies changepoints in the timeline where there are discernible differences in the activity of mutations in the time points before and after the changepoints. Specifically, the changepoints partition the timeline into segments of mutations with approximately constant activities. TrackSig fits activities for this set using EM algorithm as described above. This procedure generates piece-wise constant activity trajectories for each signature. To select changepoints, we adapt pruned exact linear time (PELT)
We compute the BIC criteria the following way. Changepoints split the timeline into (
If the number of variant alleles per cell is increased by a clonal copy number change, TrackSig’s CCF estimates might be >1. To correct for this, when displaying activity trajectories, it merges all the time points that have average CCF ≥ 1 into one time point. As such, the first time point can contain more than 100 mutations. To determine a signature activity at this new time point, TrackSig simply takes an average activity of all merged time points (those having CCF ≥ 1).
To compute the number of distinct subclones, we adjust the number of detected changepoints to correct for overlap in the CCF space of mutations from different subclones. Consider the case of two subclones whose mutations overlap substantially in CCF space. In this case, TrackSig might find three segments instead of two: one with signatures activities reflecting the first subclone; another with activities reflecting a mixture of the two subclones; and last with activities reflecting the second subclone. If this happens, then the direction of change of all signatures will be the same in the two changepoints. As such, when counting the number of distinct subclones, we treat each such pair of changepoints as one subclone boundary. Such a situation only occurs in 2.6% of 2552 PCAWG tumour samples to which we applied TrackSig; in 77% of those cases we remove a single changepoint.
TrackSig estimates uncertainty in the activity estimates by bootstrapping the mutations and refitting the activity trajectories. Specifically, it takes the random subset of
Only a subset of signatures are active in a particular sample, and this subset is largely determined by a cancer type. For the analyses reported above, we use a set of active signatures provided by PCAWG
Here we evaluate three different ways to select the active signatures, all supported by TrackSig. The first strategy, “all-sigs”, simply computes activity trajectories for all signatures. The second, “cancer-type-specific-sigs”, uses all signatures reported as active in the cancer type under consideration. The final strategy, “sample-specific-sigs”, first fits signature activities to the full set of mutation counts using an initial set of signatures, and sets the active signatures to be those with activities greater than a threshold (by default, 5%) in the initial fit. Then TrackSig computes activity trajectories only for the active signatures. In the following, we evaluate “sample-specific-sigs” when the initial set is “all-sigs”, however, we suspect this approach will also work well with “cancer-type-specific-sigs” as the initial set. We evaluate each strategy by comparing the active signatures selected by TrackSig with those reported by PCAWG-Signature group on the PCAWG tumour set
For “all-sigs”, we used all 48 signatures and we found on average, 44.7% of overall activity assigned by TrackSig is assigned to the active signatures selected by PCAWG-Signature group. Each incorrect signature gets 1.3% of activity on average. In other words, the incorrect activity is widely distributed among the signatures. Using “cancer-type-specific-sigs” improves the correspondence to 68.7% of the total activity on average. This strategy reduces the initial set of potentially active signatures from 48 down to 12 on average (ranging from 4 signatures in Lower Grade Glioma to 24 signatures in Liver Cancer). Here, we observe that signature 5 and 40 are the most prevalent among the incorrect signatures, having the average activity of 14% and 12.6%, respectively in the samples where they are supposed to be inactive. Finally, if we use the “sample-specific-sigs” strategy starting with “all-sigs” as the initial set, we exactly recover the active signatures reported by PCAWG-Signature group.
Fitting either per cancer or per sample signatures results in more activity mass to be on the correct signatures and speeds up the computations. Therefore, we recommend choosing one of these instead of using activities from the full set.
Further information on research design is available in the
We thank Pan-cancer Analysis of Whole Genomes (PCAWG) network, and in particular the PCAWG Evolution and Heterogeneity working group, for providing data, analysis and valuable input on this project. We would in particular like to highlight Peter Van Loo, Clemency Jolly, Stefan Dentro, David Wedge, Paul Boutros, Lydia Liu, and Moritz Gerstung who provided valuable feedback during the development of the TrackSig methodology. We acknowledge the contributions of the many clinical networks across ICGC and TCGA who provided samples and data to the PCAWG Consortium, and the contributions of the Technical Working Group and the Germline Working Group of the PCAWG Consortium for collation, realignment and harmonised variant calling of the cancer genomes used in this study. We thank the patients and their families for their participation in the individual ICGC and TCGA projects. We would like to acknowledge SciNet as part of Compute Canada for providing computational resources. This research was partially supported by an Natural Science and Engineering Research Council operating grant; an Associate Investigator award from the Ontario Institute of Cancer Research; and a subgrant from the Canadian Centre for Computational Genomics genomics technology platform funded by Genome Canada, all to QDM. It also received funding from the University of Toronto’s Medicine by Design initiative, which in part of the Canada First Research Excellence Fund (CFREF) and the Compute the Cure gift from the NVIDIA foundation. QDM is a Canada CIFAR AI chair at the Vector Institute.
Q.D.M. designed the project and supervised the study. Y.R. designed and implemented the method and performed the experiments. Y.R. and Q.D.M. wrote the manuscript with assistance from C.H. and RS. Y.R. and C.H. made figures. R.S. implemented PELT algorithm. R.L. performed the non-parametric simulations. C.H. and Y.R. performed clonal evolution simulations. C.H. implemented the SciClone+DeconsructSigs baseline. J.W. and A.D. provided assistance with tumour phylogeny reconstruction. N.L. wrote the script to extract tri-nucleotide counts. The PCAWG Evolution and Heterogeneity Working Group (co-led by Paul T Spellman, Peter Van Loo, and David C Wedge) provided critical feedback during that the development of TrackSig. The PCAWG Consortium, as whole, provided analysis of whole-genome sequencing data used herein, the mutational signatures, and feedback on the method. All authors read and approved the final manuscript.
Somatic and germline variant calls, mutational signatures, subclonal reconstructions, transcript abundance, splice calls and other core data generated by the ICGC/TCGA Pan-cancer Analysis of Whole Genomes Consortium is described here
TrackSig Code is available at
The authors declare no competing interests.
Supplementary Information Peer Review File Reporting Summary