Repository logo

What drives the scatter of local star-forming galaxies in the BPT diagrams? A Machine Learning based analysis

Accepted version



Change log



We investigate which physical properties are most predictive of the position of local star forming galaxies on the BPT diagrams, by means of different Machine Learning (ML) algorithms. Exploiting the large statistics from the Sloan Digital Sky Survey (SDSS), we define a framework in which the deviation of star-forming galaxies from their median sequence can be described in terms of the relative variations in a variety of observational parameters. We train artificial neural networks (ANN) and random forest (RF) trees to predict whether galaxies are offset above or below the sequence (via classification), and to estimate the exact magnitude of the offset itself (via regression). We find, with high significance, that parameters primarily associated to variations in the nitrogen-over-oxygen abundance ratio (N/O) are the most predictive for the [N II]-BPT diagram, whereas properties related to star formation (like variations in SFR or EW[Hα]) perform better in the [S II]-BPT diagram. We interpret the former as a reflection of the N/O-O/H relationship for local galaxies, while the latter as primarily tracing the variation in the effective size of the S+ emitting region, which directly impacts the [S II]emission lines. This analysis paves the way to assess to what extent the physics shaping local BPT diagrams is also responsible for the offsets seen in high redshift galaxies or, instead, whether a different framework or even different mechanisms need to be invoked.



galaxies: abundances, galaxies: evolution, galaxies: fundamental parameters, galaxies: ISM

Journal Title

Monthly Notices of the Royal Astronomical Society

Conference Name

Journal ISSN


Volume Title


Oxford University Press (OUP)
Science and Technology Facilities Council (ST/M001172/1)
European Research Council (695671)
STFC (2120607)
Royal Society (RSRP\R1\211056)
STFC (ST/V000918/1)