Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework

Change log
Shen, Longzhu Q. 
Sethi, Tushar 
Raymond, Peter 

Abstract: Nitrogen (N) and Phosphorus (P) are essential nutritional elements for life processes in water bodies. However, in excessive quantities, they may represent a significant source of aquatic pollution. Eutrophication has become a widespread issue rising from a chemical nutrient imbalance and is largely attributed to anthropogenic activities. In view of this phenomenon, we present a new geo-dataset to estimate and map the concentrations of N and P in their various chemical forms at a spatial resolution of 30 arc-second (∼1 km) for the conterminous US. The models were built using Random Forest (RF), a machine learning algorithm that regressed the seasonally measured N and P concentrations collected at 62,495 stations across the US streams for the period of 1994–2018 onto a set of 47 in-house built environmental variables that are available at a near-global extent. The seasonal models were validated through internal and external validation procedures and the predictive powers measured by Pearson Coefficients reached approximately 0.66 on average.


Funder: University of Cambridge, Department of Zoology

Funder: NASA NNX17AI74G

Data Descriptor, /704/47/4112, /704/242, data-descriptor
Journal Title
Scientific Data
Conference Name
Journal ISSN
Volume Title
Nature Publishing Group UK
Deutsche Forschungsgemeinschaft (German Research Foundation) (DO 1880/1-1)