End-to-end data-driven weather prediction.
Published version
Peer-reviewed
Repository URI
Repository DOI
Type
Change log
Abstract
Weather prediction is critical for a range of human activities including transportation, agriculture and industry, as well as the safety of the general public. Machine learning is transforming numerical weather prediction (NWP) by replacing the numerical solver with neural networks, improving the speed and accuracy of the forecasting component of the prediction pipeline 1,2,3,4,5,6. However, current models rely on numerical systems at initialisation and to produce local forecasts, limiting their achievable gains. Here we show that a single machine learning model can replace the entire NWP pipeline. Aardvark Weather, an end-to-end data-driven weather prediction system, ingests observations and produces global gridded forecasts and local station forecasts. The global forecasts outperform an operational NWP baseline for multiple variables and lead times. The local station forecasts are skillful up to ten days lead time, competing with a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. End-to-end tuning further improves the accuracy of local forecasts. Our results show that skillful forecasting is possible without relying on NWP at deployment time, which will enable the full speed and accuracy benefits of data-driven models to be realised. We believe Aardvark Weather will be the starting point for a new generation of end-to-end models that will reduce computational costs by orders of magnitude, and enable rapid, affordable creation of customised models for a range of end-users.
Description
Acknowledgements: We acknowledge the agencies whose efforts in collecting, curating and distributing datasets made this study possible. This study stands on the foundation of decades of contributions from the meteorological community and their commitment to sharing data. Specifically, we thank the European Organisation for the Exploitation of Meteorological Satellites, the UK Met Office, the National Environmental Satellite, Data, and Information Service, the National Centers for Environmental Information, the National Oceanic and Atmospheric Administration, the National Climatic Data Center, the NSF National Center for Atmospheric Research and ECMWF. The JASMIN Environmental Data Service and WeatherBench 2 project provided invaluable access to pre-processed data sources. This study was generously supported by The Alan Turing Institute, with funding and access to computational resources. A.A. acknowledges the UKRI Centre for Doctoral Training in the Application of Artificial Intelligence to the study of Environmental Risks (AI4ER), led by the University of Cambridge (EP/S022961/1), and studentship funding from Google DeepMind. S.M. acknowledges funding from the Vice Chancellor’s and George and Marie Vergottis scholarship of the Cambridge Trust and the Qualcomm Innovation Fellowship. W.T. acknowledges funding from Huawei and EPSRC grant EP/W002965/1. J.R. acknowledges funding from the Data Sciences Institute at the University of Toronto. J.S.H. is supported by The Alan Turing Institut’s Turing Research and Innovation Cluster in Digital Twins, the Environment and Sustainability Grand Challenge and EPSRC grant EP/Y028880/1. R.E.T. is supported by an EPSRC Prosperity Partnership grant EP/T005386/1 between the University of Cambridge and Microsoft. We would like to thank T. Lazauskas for cloud engineering support in setting up the compute platform, J. Bronskill for technical advice on both compute and machine learning techniques, P. Dueben for advice on baselines and P. Lean for advice on counting the number of observation input to the IFS.
Journal Title
Conference Name
Journal ISSN
1476-4687

