Single-cell technologies are transforming biomedical research, including the recent demonstration that unspliced pre-mRNA present in single-cell RNA-Seq permits prediction of future expression states. Here we apply this RNA velocity concept to an extended timecourse dataset covering mouse gastrulation and early organogenesis.
Intriguingly, RNA velocity correctly identifies epiblast cells as the starting point, but several trajectory predictions at later stages are inconsistent with both real-time ordering and existing knowledge. The most striking discrepancy concerns red blood cell maturation, with velocity-inferred trajectories opposing the true differentiation path. Investigating the underlying causes reveals a group of genes with a coordinated step-change in transcription, thus violating the assumptions behind current velocity analysis suites, which do not accommodate time-dependent changes in expression dynamics. Using scRNA-Seq analysis of chimeric mouse embryos lacking the major erythroid regulator Gata1, we show that genes with the step-changes in expression dynamics during erythroid differentiation fail to be upregulated in the mutant cells, thus underscoring the coordination of modulating transcription rate along a differentiation trajectory. In addition to the expected block in erythroid maturation, the Gata1-chimera dataset reveals induction of PU.1 and expansion of megakaryocyte progenitors. Finally, we show that erythropoiesis in human fetal liver is similarly characterized by a coordinated step-change in gene expression.
By identifying a limitation of the current velocity framework coupled with in vivo analysis of mutant cells, we reveal a coordinated step-change in gene expression kinetics during erythropoiesis, with likely implications for many other differentiation processes.
The online version contains supplementary material available at
Cellular differentiation into diverse cell types underpins all metazoan development. Moreover, cellular differentiation processes are also crucial for stem cell-mediated tissue maintenance, and their perturbation has been implicated in ageing-associated regenerative failure as well as malignant transformation [
Comprehensive molecular profiling necessarily entails the generation of snapshot data, because cells need to be fixed to examine their molecular content. This in turn represents a major drawback for the study of differentiation processes, which commonly occur over extended timeframes via complex trajectories underpinned by intricate decision-making processes. Much excitement was therefore generated by a recent seminal study [
One system where the RNA velocity concept has particular potential is erythropoiesis, the process whereby oxygen-transporting red blood cells are generated from multipotent hematopoietic progenitors. Research into the transcriptional control processes of erythropoiesis led to several paradigmatic discoveries, including the dissection of distal transcriptional control elements [
Here, we have applied RNA velocity to a recently published scRNA-Seq dataset of nine sequential timepoints, spaced 6 h apart, which encompass mouse gastrulation and early organogenesis [
To evaluate RNA velocity-based trajectory inference with a complex dataset, we applied the scVelo analysis pipeline [ Inferring differentiation trajectories at organismal scale.
Taken together therefore, we have identified that for erythroid development, the output of scVelo is inconsistent with the timecourse information gathered from the experimental design of the gastrulation atlas.
We next asked whether this issue is due to a general lack of biologically meaningful information captured in the unspliced reads.
To this end, we exploited two variance-based dimensionality reduction methods, principal component analysis (PCA) and Multi-Omics Factor Analysis (MOFA [ Unspliced counts contribute to explaining the variability among cell types.
Multiomics factor analysis therefore not only demonstrates that the unspliced reads in the gastrulation atlas dataset contain biologically relevant information, but also suggests that integrated analysis of spliced and unspliced reads may more broadly facilitate the interpretation of complex scRNA-Seq datasets.
Having confirmed the utility of unspliced reads, we next explored whether the inability to recover real-time progression in whole embryo trajectory inference using scVelo might be related to the assumptions made by the current RNA velocity analysis tools. The derivation of gene-specific expression kinetics underpins the scVelo analysis pipeline, as illustrated by so-called phase plots that depict the amounts of spliced versus unspliced reads within a population of cells [ A set of genes with complex expression kinetics confounds velocity estimation in erythropoiesis.
We next set out to identify all genes exhibiting this rapid increase in expression levels in the Erythroid 3 population (Fig.
Having identified a set of genes with a coordinated increase in expression rate midway through erythropoiesis, we next asked what function these genes might play in the broader transcriptional program of red blood cell maturation. Visual inspection of the gene list revealed it to contain archetypal red blood cell genes including the globin genes
We next removed this set of MURK genes and recalculated the RNA velocity-inferred trajectories. As can be seen in Fig.
The scVelo suite also calculates a so-called latent time, which represents the pseudotime ordering hidden in the spliced and unspliced dynamics, and is more powerful than previously described pseudotime inferring approaches since it incorporates both the gene dynamics and the spliced and unspliced information [
Taken together therefore, this analysis shows that inconsistent RNA velocity-inferred trajectories can be remedied by the removal of genes with complex expression kinetics.
To corroborate upregulation of our identified MURK genes during erythropoiesis, we interrogated a previously published dataset with transcriptomic analysis of a loss of function model for the erythropoiesis master-regulator In vivo analysis of Gata1 function using a chimaera assay coupled with scRNA-Seq.
Our newly identified erythropoietic MURK genes therefore perform key roles in red blood cell function, and their upregulation was validated in an independent model of red blood cell maturation.
The G1ER cell line represents an in vitro model, and the published differential gene expression data were from bulk microarray profiling, thus precluding any analysis of single-cell gene expression kinetics. We therefore turned to our recently reported Chimaera-Seq approach, whereby scRNA-Seq is coupled with mouse chimeric embryo technology, to define both cellular and molecular consequences of gene knockouts in vivo [
We then concatenated the chimera data with the Pijuan-Sala et al. [
The newly generated
Although the role of Gata1 is well documented in developmental erythropoiesis [
Regarding the megakaryocytic subset, we observed upregulation of progenitor markers Gata1 chimaera assay reveals disruption of MURK genes and perturbed yolk sac hematopoiesis.
Interestingly, all hemato-endothelial cell subsets displayed upregulation of
In the early erythroid subset, Ery1, we again noted that the mutant cells displayed increased expression of genes characteristic of a progenitor signature. Conversely, erythroid maturation hallmark genes such as
In addition to the failure of inducing genes associated with erythroid maturation, single-cell resolution molecular analysis also revealed a striking failure to downregulate genes associated with alternative lineage programs such as Pu.1, consistent with the notion that the earliest wave of primitive hematopoiesis produces erythroid cells, megakaryocytes, and macrophages, with evidence for at least bipotential progenitor cells [
Having generated the Chimaera-Seq single-cell data for both wildtype and Gata1 knockout cells, we next used the ratio of spliced/unspliced reads to explore differences in expression kinetics between the wildtype and mutant cells. As can be seen in Fig.
However, preliminary modelling analysis suggests that the change observed in MURK gene dynamics is due to altered transcription rates (see Additional file
Having identified a coordinated increase in transcription rate during mouse yolk sac erythropoiesis, we next wanted to ascertain whether the same phenomenon could also be seen in human cells. Moreover, we were keen to explore an scRNA-Seq dataset generated by a different laboratory, to exclude any potential technical bias caused by our own experimental protocols. We therefore turned to a recently published comprehensive dataset of human fetal liver erythropoiesis [ Concept of dual kinetics of gene expression is also revealed in human fetal liver hematopoiesis.
There is no doubt that single-cell molecular profiling constitutes a transformative technology. It suffers however from the major drawback that cells need to be fixed in order to profile them, with the consequence that measurements are by necessity static snapshots. To decipher complex biological processes, however, temporal information is commonly required. The single-cell RNA velocity concept raised the prospect of overcoming some of the limitations associated with static measurements, by providing a strategy that can infer future cellular states. The RNA velocity framework is based on an explicit model of transcriptional processes (transcription, splicing, degradation). The notion that physical parameters of gene expression can be deduced from single-cell gene expression data had been explored before the single-cell RNA velocity concept was introduced [
As to the precise mechanisms, at this stage we can only confidently assert that this coordinated change in expression dynamics occurs downstream of Gata1 during erythropoiesis. Of note, comprehensive analysis of the G1ER erythroid differentiation model has shown that Gata1-induced maturation triggers increased enhancer/promoter interactions for upregulated genes and that the most highly enriched motif in the promoters of these genes are GATA sites [
Our observations regarding the Gata1 knockout phenotype also warrant some discussion. With embryonically lethal phenotypes such as Gata1 knockout, conventional analysis tends to be somewhat limited, since the embryos are dead because they have no red blood cells. By contrast, the Chimaera-Seq assay enables both quantification of cell numbers and characterization of their molecular profiles. Moreover, there are no secondary effects caused by the dying embryo, because the wildtype host cells rescue overall fetal development, thus allowing a focused analysis of cell-intrinsic molecular defects. One noteworthy observation from our data is that erythroid differentiation proceeds substantially beyond the stage where
Within hematopoiesis, Pu.1 is recognized as a key regulator of myeloid and T cell lineages, but not erythroid cells, even though a role in the proliferation of immature erythroid progenitors has been reported ( [
Our observation of an expanded pool of megakaryocyte progenitors may also be of direct relevance to our understanding of the pre-leukemic transient myeloproliferative disease (TMD) that is prevalent in newborns with trisomy 21 [
Application of the single-cell RNA velocity concept has commonly been “confirmatory”, whereby a differentiation path proposed by other means was shown to be consistent with RNA velocity inference. When we applied the RNA velocity framework to the entire mouse gastrulation atlas, some inferred vectors of differentiation agreed with our current understanding of developmental biology, but others disagreed. Deeper interrogation of predictions that conflicted with our current understanding of erythropoiesis showed that the RNA velocity predictions could not be correct, not only because they ran counter to the known expression changes that accompany red blood cell differentiation, but also because they contradicted the real-time sampling of the data. Our results thus highlight certain limitations of the current implementation of this framework for identification of novel trajectories. Importantly however, it is through our observation of the inconsistent predictions that we were led to identify the previously unrecognized dynamic nature of the transcriptional control of erythropoiesis. Our extension to the scVelo implementation reveals the presence of such time-dependent changes of gene expression parameters and retrieves the concerned MURK genes in developmental trajectories of interest. To verify whether other developmental processes beyond erythropoiesis may involve time-dependent changes of gene expression parameters, we interrogated two additional trajectories where application of scVelo to the whole Atlas reference had resulted in arrow predictions contrary to real-time progression (see Additional file
One of the major attractions of current usage of the RNA velocity framework is that the added information on unspliced reads comes essentially “for free,” as it is extracted from the raw scRNA-Seq counts. It is however worth remembering that technologies reliant on oligo dT priming are not designed to capture intronic reads with high efficiency, a problem exacerbated when using current droplet methods that sequence specifically either the 3′ or 5′ ends of genes, but not the rest. A likely reason for the capture of intronic reads may be the priming of oligo dT onto small stretches of A present in introns, but there certainly seems scope for the development of future methods specifically designed to increase the capture of unspliced / intronic sequences. It is noteworthy however that our MOFA analysis in Fig.
Of note, current RNA velocity frameworks consider only a single reason for the presence of introns, namely that a pre-mRNA has not been fully processed. However, it is known that other processes such as intron retention can result in the presence of intronic sequences in otherwise fully processed cytoplasmic mRNA molecules [
Taken together, this study reports how the RNA velocity framework can be extended to delve into the transcriptional mechanisms of tissue differentiation, complemented with single-cell resolution and in vivo analysis of Gata1 function, which revealed a number of previously unknown facets of this canonical regulator of red blood cell development.
To obtain separated count matrices for spliced and unspliced mRNAs, we ran velocyto 0.17.17 [
We first downloaded raw reads from Popescu et al. ([
We ran MOFA+ v1.4.0 [
To identify MURK genes, we considered the imputed counts resulting from the scVelo standard pipeline. Then, for each gene and each population among the Erythroid lineage, we calculated the unspliced versus spliced slope with a linear regression, as well as the standard error on the slope. In the mouse dataset, we selected all genes for which the slope in Erythroid3 is significantly higher than the slope in Erythoid2 (according to a one-sided t-test p value < 0.05), the average spliced counts in Erythroid3 is higher than the average spliced counts in every other population, and the slope in Erythroid3 positive. We found 89 genes that respect all these criteria.
In the human dataset, in order to obtain erythroid populations more comparable to our mouse data, we re-clustered the erythroid clusters (Fig.
We performed gene ontology enrichment analysis using the
Overlap was tested with Fisher exact test. We calculated the probability of having m = 55 genes of our n = 89 MURK genes mapping to the A = 1022 high response genes (out of N = 4195 genes) in the Wu et al. [
All procedures were performed in strict accordance to the UK Home Office regulations for animal research under the project license number PPL 70/8406.
TdTomato-expressing mouse embryonic stem cells (ESC) were derived as previously described [
Raw files were processed with Cell Ranger 3.0.2 using default mapping arguments. Reads were mapped to the mm10 genome and counted with GRCm38.92 annotation, including tdTomato sequence for chimera cells. Cell barcodes with expression profiles significantly different to the ambient mRNA expression profile were identified using emptyDrops [
We mapped the chimera cells to the mouse atlas following almost exactly the procedure used in the original publication article to map the
For differential gene expression analysis, we took samples that included at least 7 cells per tdTom status per cell population (e.g., Erythroid3). We ran the analysis in scanpy v1.5.1 [
We would like to thank Prof. Fabian Theis and Volker Bergen for discussions and valuable input on the scVelo implementation. We thank Prof. Ross Hardison for providing the list of Gata1-regulated genes from Wu et al. [
The review history is available as Additional file
Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
M.B. performed scVelo implementations in mouse and human datasets, mathematical modelling, and analysis of Gata1 embryonic chimera dataset; I.I-R. assisted on the scVelo implementation in mouse datasets; I.I. performed Gata1 CRISPR/Cas9 targeting and expansion of the resulting mutant lines; S.G. performed quality controls of the Gata1 embryonic chimera dataset; S.G. and C.G. performed initial analysis of the Gata1 embryonic chimera dataset; C.G. designed and optimized the mutant chimera single-cell profiling experiments; B.G. wrote the initial draft of the manuscript; M.B., C.G., and J.C.M. edited the manuscript; J.N., J.C.M., C.G., and B.G. supervised the study. All authors read and approved the final manuscript.
Twitter handles: @ivansk8pk (Ivan Imaz-Rosshandler); @shazanfar (Shila Ghazanfar)
Research in the authors’ laboratories is supported by the Wellcome Trust, MRC, CRUK, Blood Cancer UK, NIH-NIDDK, and the Sanger-EBI Single Cell Centre; by core support grants by the Wellcome Trust to the Cambridge Institute for Medical Research and Wellcome Trust-MRC Cambridge Stem Cell Institute; and by core funding from Cancer Research UK and the European Molecular Biology Laboratory. C.G. was funded by the Swedish Research Council (2017-06278), I.I. was funded by a British Heart Foundation studentship (FS/18/56/35177), S.G. was supported by a Royal Society Newton International Fellowship (NIF\R1\181950). This work was funded as part of a Wellcome Strategic Award (105031/D/14/Z) awarded to Wolf Reik, Berthold Göttgens, John Marioni, Jennifer Nichols, Ludovic Vallier, Shankar Srinivas, Benjamin Simons, Sarah Teichmann, and Thierry Voet.
This research was funded in part by the Wellcome Trust (105031/D/14/Z, 097922/Z/11/Z, 206328/Z/17/Z). For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
The dataset of Gata1 embryonic chimeras generated in the current study (Figs.
The dataset analyzed in Figs.
The dataset analyzed in Fig.
The dataset analyzed in Fig.
All animal experiments were performed in strict accordance to the UK Home Office regulations for animal research under the project license number PPL 70/8406.
Not applicable.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.