Enhancing Streamflow Modeling in Data-Scarce Catchments with Similarity-Guided Source Selection and Transfer Learning
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Accurate streamflow modeling in data-scarce catchments remains a significant challenge due to the limited availability of historical records. Transfer Learning (TL), increasingly applied in hydrology, leverages knowledge from data-rich catchments (sources) to enhance predictions in data-scarce catchments (targets), providing new possibilities of hydrological predictions. Most existing TL approaches pre-train models on large-scale meteoro-hydrological datasets and show good generalizability across multiple target catchments. However, for a specific target catchment, it remains unclear which source catchments contribute most effectively to the accurate prediction. Including many irrelevant sources may even degrade model performance. In this study, we investigated how source catchment selection affects TL performance by employing similarity-guided strategies based on three key factors, i.e., spatial distance, physical attributes, and flow regime characteristics. Using the CAMELS-GB dataset, we conducted comparative experiments by pre-training the networks with different ranked groups of the source catchments and fine-tuning them on three target catchments representing distinct hydrological environments. The results showed that carefully selected small subsets (fewer than 40, or even as few as 10) of highly similar catchments can achieve comparable or better TL performance than using all 668 available source catchments. All three target catchments yielded better NSE results from source catchments with closer spatial proximity and more consistent flow regimes. The TL performance of physical attribute similarity-based selection varied depending on the attribute combinations, with those related to land cover, climate, and soil properties leading to superior performance. These findings highlight the importance of similarity-guided source selection in hydrological TL. In addition, they demonstrate ways to reduce computational costs while improving modeling accuracy in data-scarce regions.
Description
Journal Title
Conference Name
Journal ISSN
2073-4441
Volume Title
Publisher
Publisher DOI
Rights and licensing
Sponsorship
EPSRC (EP/Y034643/1)

