Repository logo

Can CNN-based species classification generalise across variation in habitat within a camera trap survey?

Published version

Change log



jats:titleAbstract</jats:title>jats:p jats:list

jats:list-itemjats:pCamera trap surveys are a popular ecological monitoring tool that produce vast numbers of images making their annotation extremely time‐consuming. Advances in machine learning, in the form of convolutional neural networks, have demonstrated potential for automated image classification, reducing processing time. These networks often have a poor ability to generalise, however, which could impact assessments of species in habitats undergoing change.</jats:p></jats:list-item>

jats:list-itemjats:pHere, we (i) compare the performance of three network architectures in identifying species in camera trap images taken from tropical forest of varying disturbance intensities; (ii) explore the impacts of training dataset configuration; (iii) use habitat disturbance categories to investigate network generalisability and (iv) test whether classification performance and generalisability improve when using images cropped to bounding boxes.</jats:p></jats:list-item>

jats:list-itemjats:pOverall accuracy (72.8%) was improved by excluding the rarest species and by adding extra training images (76.3% and 82.8%, respectively). Generalisability to new camera locations within a disturbance level was poor (mean F1‐score: 0.32). Performance across unseen habitat disturbance levels was worse (mean F1‐score: 0.27). Training the network on multiple disturbance levels improved generalisability (mean F1‐score on unseen disturbance levels: 0.41). Cropping images to bounding boxes improved overall performance (F1‐score: 0.77 vs. 0.47) and generalisability (mean F1‐score on unseen disturbance levels: 0.73), but at a cost of losing images that contained animals which the detector failed to detect.</jats:p></jats:list-item>

jats:list-itemjats:pThese results suggest researchers should consider using an object detector before passing images to a classifier, and an improvement in classification might be seen if labelled images from other studies are added to their training data. Composition of training data was shown to be influential, but including rarer classes did not compromise performance on common classes, providing support for the inclusion of rare species to inform conservation efforts. These findings have important implications for use of these methods for long‐term monitoring of habitats undergoing change, as they highlight the potential for misclassifications due to poor generalisability to impact subsequent ecological analyses. These methods therefore need to be considered as dynamic, in that changes to the study site would need to be reflected in the updated training of the network.</jats:p></jats:list-item> </jats:list> </jats:p>


Funder: AXA Research Fund; Id:

Funder: Sime Darby Plantation Bhd; Id:


camera trap, convolutional neural network, deep learning, disturbance, generalisability, image classification, object detection

Journal Title

Methods in Ecology and Evolution

Conference Name

Journal ISSN


Volume Title


Natural Environment Research Council (NE/P012345/1)