In this paper we characterize the performance of venture capital-backed firms based on their ability to attract investment. The aim of the study is to identify relevant predictors of success built from the network structure of firms’ and investors’ relations. Focusing on deal-level data for the health sector, we first create a bipartite network among firms and investors, and then apply functional data analysis to derive progressively more refined indicators of success captured by a binary, a scalar and a functional outcome. More specifically, we use different network centrality measures to capture the role of early investments for the success of the firm. Our results, which are robust to different specifications, suggest that success has a strong positive association with centrality measures of the firm and of its large investors, and a weaker but still detectable association with centrality measures of small investors and features describing firms as knowledge bridges. Finally, based on our analyses, success is not associated with firms’ and investors’ spreading power (harmonic centrality), nor with the tightness of investors’ community (clustering coefficient) and spreading ability (VoteRank).

Christian Esposito, Marco Gortan and Lorenzo Testa Equal contributor

The online version contains supplementary material available at

In their pursuit of entrepreneurial opportunities, new firms rely on a variety of financing sources. When internal means of financing (e.g. owner capital and cash flow) are insufficient to support the growth of the business, firms will seek external capital (Hall and Lerner

In the VC investment model, success implies a positive exit outcome of the investee firm through an IPO or trade sale, generating (ideally optimal) returns for the investors (Gompers

In this study we take a different approach, and consider the ability of the firm to raise funds as an indication that the firm is achieving those milestones that mark the path to exit events, and remains through time a promising vehicle of future returns. We implement this procedural view of success by investigating specific features of the network of firms and investors built from deal-level data.

A number of studies employ network tools to describe interactions among these economic agents. For instance, Bonaventura et al. (

We retrieve data from CB Insights (

The reminder of this article is organized as follows. After a description of the complex, time-varying structure of the bipartite network of investors and firms, we show how our definitions of success can be related to standard definitions in the literature. Next, we introduce statistics computed on the projections of the bipartite network, and show their association with our definitions of success. We do this using a temporal window of 10 years along which we consider a firm’s financing rounds after its first investment, and demonstrate the advantages of our approach. Since the length of the temporal window employed in our analysis is somewhat arbitrary, we repeat it with lengths varying from 5 to 12 years – ascertaining the robustness of our findings. We apply a similar stability check varying the set of covariates employed in our regression models. Finally, we discuss our main results and provide some concluding remarks.

We build a

By projecting the bipartite network on the firms’ layer, we produce a projected graph which is employed to compute the statistics described in Table

By projecting the bipartite network on the investors’ layer, we produce a second projected graph. Here two investors are linked if they have invested in the same firm in the same financing round. In order to use also investors’ centrality measures as potential predictors for firms’ success, we compute the maximum, the minimum and the median of the centrality distribution of the ‘early’ investors in each firm, i.e. those who participated in the firm’s first recorded funding round.

Statistics computed on the projected graphs of investors and firms.

Covariate | Network interpretation | Note |
---|---|---|

Average neighbor degree | Affinity between neighbor nodes | |

Betweenness centrality | Role within flow of information | |

Closeness centrality | Spreading power (shortest average distance from all other nodes) | |

Clustering coefficient | Tight community | |

Core number | Importance within cluster | Computed only for firms |

Degree centrality | Influence | |

Eigenvector centrality | Influence | |

Harmonic centrality (Marchiori and Latora | Spreading power | |

Newman betweenness centrality (Newman | Role within flow of information | |

Number of investors | Computed only for firms | |

PageRank (Page et al. | Influence | |

VoteRank (Zhang et al. | Best spreading ability |

Note: to link investors’ centrality measures to firms we consider investors involved in the first investment round of a given firm, and summarize the distributions of their statistics through maximum, minimum and median

Figure

Projection of the bipartite network on the firms’ layer, aggregated by country

We start by assessing whether the centrality measures computed from our networks correlate with firms’ success according to a standard definition, reproducing the exercise presented in Bonaventura et al. (

Bonaventura et al. (

In our analysis, we link firms and investors according to observed investments. Therefore, we act as an investor who has already seen the first investment in the firms of interest. Figure

Success rate for different centrality measures adopting the methodology proposed by Bonaventura et al. (

After assessing centrality measures against standard definitions of success, and before turning to regression exercises for alternative and progressively more refined success outcomes, we pre-process our covariates as follows. After log-transforming those that present markedly right-skewed distributions, we scale them all and analyze their correlation structure by building a feature dendrogram (Pearson absolute correlation, complete linkage; see Fig.

Dendrogram of firms’ and investors’ centrality measures (absolute correlation distance, complete linkage). Seven groups of features are highlighted. The first from the bottom contains centrality measures for smallest and median investors; the second includes investors’ VoteRank statistics; the third contains firms’ and investors’ closeness and average neighborhood centrality, as well as measures of firms’ and big investors’ centrality; the fourth includes firms’ and big investors’ eigenvector centrality measures, as well as firms’ PageRank; the fifth contains features describing firms as knowledge bridges; the sixth includes harmonic centrality measures; the seventh includes investors’ clustering coefficients

We leverage these groups to guide feature selection for our regression exercises. For each response type (binary, scalar, functional; more on that below), we reduce the initial set of covariates to seven predictors, selecting one per group through an exhaustive search for the combination that optimizes the goodness of fit. We later consider further (sub-optimal) combinations comprising one covariate per group, as a check on the stability of our analysis.

Each firm has its own funding history. After its birth, the firm collects resources over time, thus being characterized by the

Our first definition of success is based on separating firm funding trajectories in two clusters, identifying high (successful) and low investment regimes. Because of the heterogeneity among healthcare sub-sectors, we run a functional

We consider a logistic regression model defined as:

Since the data set is unbalanced, results on the best model configuration, which has a log-likelihood of

Scatter plots of logistic regression coefficient estimates (horizontal) and significance (vertical;

Next, we investigate whether the binary definition of success derived from firms’ funding trajectories is related to a more standard definition, i.e., their eventual exit in IPO, acquisition or merger. This is captured by the confusion matrix in Table

Confusion matrix for standard vs trajectory-based binary definition. Accuracy: 0.71; recall: 0.31; precision: 0.57

High-regime class | Low-regime class | |
---|---|---|

IPO/Acquired/Merged | 294 | 664 |

No IPO/Acquired/Merged | 225 | 1889 |

The evidence of a relationship between the success of a firm and the network features obtained using our trajectory-based binary response is promising. However, our binary definition of success is very rough and the unbalance in the data forces us to run the analysis relying on reduced sample sizes. Moreover, the confusion matrix in Table

We consider a regression model defined as:

Linear regression results

Agg. money raised (log) | Diff. money raised (log) | |
---|---|---|

Pagerank_median (log) | 0.1186 | 0.2645 |

(0.044) | (0.063) | |

Voterank_max | -0.0231 | 0.1116 |

(0.065) | (0.094) | |

Average_neighbor_degree_max (log) | 0.3084 | 0.9215 |

(0.058) | (0.075) | |

Eigenvector_centrality_org | 0.0316 | 0.0721 |

(0.021) | (0.030) | |

Clustering_org | -0.0672 | -0.2436 |

(0.030) | (0.041) | |

Harmonic_centrality_median | 0.0584 | -0.1171 |

(0.073) | (0.104) | |

Clustering_min | 0.0005 | -0.0480 |

(0.035) | (0.050) | |

Intercept | 16.2843 | 15.7837 |

(0.176) | (0.252) | |

Observations | 1921 | 1917 |

R | 0.485 | 0.217 |

Adjusted R | 0.482 | 0.213 |

F Statistic | 149.90 | 48.11 |

Next, as we did for our binary response, we assess whether aggregate money raised is linked to the standard definition of success (exit in IPO, acquisition or merger within 10 years from the first investment). Figure

Boxplots of aggregate money raised (log) for ‘traditionally successful’ (right; 1) and ‘traditionally unsuccessful’ (left; 0) firms

Our scalar outcome, which measures the aggregate money raised by a firm within a period of 10 years (the selected window size), does not capture how the investments in the firm distribute across such period – something that may be very important in delineating success. Moreover, using aggregate money raised implicitly assumes that the right time to investigate the dependence of success on network features is at the end of the period considered.

We tackle these issues by refining the target outcome and considering the full cumulative investment trajectories – instead of their end point. Thus, we run a function-on-scalar regression (Kokoszka and Reimherr

In this model, the regression coefficient of a scalar covariate

Function-on-scalar regression. Blue solid lines represent coefficient curve estimates, surrounded by blue confidence bands (constructed through point-wise estimated standard errors, 95% confidence level). Red dashed lines mark 0. The estimated intercept can be interpreted as the sheer effect of time

In order to validate our previous results we extend the analysis in two ways. First, we vary the length of the funding trajectories, re-running our pipelines for all window sizes between 5 and 12 years. Second, we take advantage of the correlation structure characterizing the covariates at our disposal to measure the stability of coefficient estimates with respect to perturbations in model configurations.

We note that varying the window size induces a change in the number of firms in our sample (see Additional file

Stability analysis under window size changes; linear regression. Blue dots represent coefficient estimates for each window size, with

Figure

Stability analysis under perturbations of the model specification; linear regression. Left panel: blue dots represent averages of the coefficient estimates obtained changing the covariates considered in each group, with

By exploiting and combining techniques from the fields of network and functional data analysis, we propose progressively more refined definitions of a firm’s success, and associate them with different network features through regression fits.

Logistic regression results for our binary outcome suggest a strong role for centrality measures belonging to groups 3 (firms and big investors), 4 (firms’ eigenvector centrality measures), and 5 (firms as knowledge bridges) – and a weaker role for centrality measures in group 6 (harmonic centrality measures). In terms of ‘best’ representatives selected within such groups, a firm’s closeness centrality from group 3 reflects the width of its investors’ portfolio (if a firm is part of a big portfolio, its separation from other firms within the network will be lower), PageRank from group 4 is a proxy of a firm’s importance and plays a role similar to eigenvector centrality in undirected networks, and a firm’s clustering coefficient from group 5 reflects tightness in community links. Concerning its estimated negative impact on success, we note that a firm’s clustering coefficient is negatively correlated to its number of investors – both because clustered firms typically belong to portfolios characterized by a high redundancy of investors, and because our network contains many isolated firms (outside the giant component) whose clustering coefficient equals 1, and who are unlikely to succeed.

Linear regression results for our scalar outcome confirm a strong role for groups 1, 3 and 5. The covariate selected from group 1 and group 4, the median of investors’ PageRank and the firms’ eigenvector centrality respectively, have positive estimated effects – suggesting that influential median investors may favor a firm’s success, as well as a firm’s own influence. The covariate selected from group 3, the maximum of investors’ average neighbor degrees, also has a positive estimated effect – suggesting that for a firm’s success forming many connections is not as critical as being connected with investors that are themselves strongly connected, as this may increase the level of capitalization in a later stage of the firm’s life. The covariate selected from group 5 is again a firm’s clustering coefficient, with a negative estimated effect on success. Our stability analysis also provides evidence that the effects of covariates from groups 3 and 5 are consistent across model specifications.

Function-on-scalar regression results for our functional outcome also confirm a strong role for groups 1, 3, 4 and 5. Interestingly, when profiled over time through this richer analysis, the positive effects of the median of investors’ PageRank (group 1) and of the maximum of investors’ average neighbor degrees (group 3) increase early in the life of a firm – but then level off. In contrast, the positive effect of a firm’s eigenvector centrality (group 4) and the negative effect of its clustering coefficient (group 5) increase throughout the temporal domain. This may mean that being connected to important and well connected investors (i.e., those with a median PageRank and large average neighbor degree) is more important early on, whereas being in a far-reaching portfolio of investors (i.e., having a small clustering coefficient and high eigenvector centrality) has a stronger impact on success later in the life of a firm.

Our analysis can be expanded in several ways. First, we limit our study to the healthcare sector, while it may be interesting to investigate other market sectors, and compare the results. Second, meso-scale communities may be analyzed in terms of their longitudinal evolution, as to characterize successful clusters of firms from a topological point of view. Third, in our analysis we gather information from the first round of funding and predict the future success of the firm, but it may be interesting to do so experimenting with different funding rounds, or with models that capture the dynamic evolution of the network, such as tools from topological data analysis (Hensel et al.

We are grateful to Enrico Stivella for useful feedback.

All authors conceived ideas and analysis approaches. C.E., M.G. and L.T. retrieved and processed data, implemented pipelines and performed statistical analyses. All authors interpreted findings. F.C., M.G. and L.T. wrote the manuscript. F.C., G.F., A.M. and G.R. supervised the research. All authors read and approved the final manuscript.

F.C., C.E., G.F., A.M. and L.T. acknowledge support from the Sant’Anna School of Advanced Studies. F.C. acknowledges support from Penn State University. G.R. acknowledges support from the scheme “INFRAIA-01-2018-2019: Research and Innovation action”, Grant Agreement n. 871042, “SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics”.

The data that support the findings of this study are available from CB Insights but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of CB Insights. Code for replication of our study is shared in the GitHub repository

Not applicable

Not applicable

The authors declare that they have no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.