Repository logo

Machine Learning for Credit Default Risk



Change log


Shuaib, Adam 


In this thesis, I present three essays examining how machine learning techniques can be used to enhance traditional approaches to credit default forecasting and pricing.

In the first essay, we provide empirical evidence in favour of a widespread non-linear, time-varying relationship between sovereign credit risk and macroeconomic fundamentals across OECD countries. Random forests significantly outperform sparse and dense linear predictive models and explain up to 80% of the out-of-sample variation in CDS spreads by conditioning on macroeconomic fundamentals alone. This suggests that non-linearity may represent a key modelling feature in capturing the cross-country variation in sovereign credit risk. A set of pure out-of-sample implementations also suggest that tree-based methods may enable "shadow" sovereign CDS pricing for countries and periods in which reliable sovereign CDS data might not be available.

In the second essay, we utilise a unique peer-to-peer (P2P) loan dataset to compare different machine learning approaches to predicting loan default over 2017-2021, a period that covers the Covid crisis. We find that P2P loan default factors appear stable over time, with total borrowing and account age the most important predictors across both pre-Covid and Covid sample periods. We subsequently show that the out-of-sample default predictability of short-maturity loans is considerably lower than for long-maturity loans, particularly during Covid. Higher loan repayment-to-income ratios render short-maturity loans more susceptible to Covid-driven income shocks not captured at loan origination. Furthermore, we document a structural break in the relation between default risk and payment holiday adoption rates for borrowers that are highly uncertain in their ability to repay a loan, consistent with the hypothesis that high degrees of financial uncertainty led to precautionary borrowing and subsequent precautionary payment holiday behaviour during the Covid crisis.

In my final essay, I explore whether loan defaults during Covid were primarily influenced by borrower credit histories or income shocks. Monthly post-origination data captures Covid-driven income shocks unseen in borrower credit histories and results in a significant improvement in the ability to predict defaults relative to credit history data alone. This effect is stronger for shorter default windows and shorter maturity loans and helps to minimise information asymmetry between borrowers and lenders. Crucially, credit history data explains only 25% of the mean default forecast during Covid. Considering these findings, I am the first to explore the concept of an interest rate "reset clause" for P2P loans. I show that such a reset clause reduces the number of mispriced loans during both Covid and non-Covid periods, resulting in significant cost savings for lenders and borrowers alike.





Rau, Raghavendra


Covid-19, Credit, Finance, Machine Learning, P2P Lending


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge