Can CNN-based species classification generalise across variation in habitat within a camera trap survey?

jats:list-item jats:pCamera trap surveys are a popular ecological monitoring tool that produce vast numbers of images making their annotation extremely time‐consuming. Advances in machine learning, in the form of convolutional neural networks, have demonstrated potential for automated image classification, reducing processing time. These networks often have a poor ability to generalise, however, which could impact assessments of species in habitats undergoing change.</jats:p></jats:list-item>

jats:list-item jats:pHere, we (i) compare the performance of three network architectures in identifying species in camera trap images taken from tropical forest of varying disturbance intensities; (ii) explore the impacts of training dataset configuration; (iii) use habitat disturbance categories to investigate network generalisability and (iv) test whether classification performance and generalisability improve when using images cropped to bounding boxes.</jats:p></jats:list-item>

jats:list-item jats:pOverall accuracy (72.8%) was improved by excluding the rarest species and by adding extra training images (76.3% and 82.8%, respectively). Generalisability to new camera locations within a disturbance level was poor (mean F1‐score: 0.32). Performance across unseen habitat disturbance levels was worse (mean F1‐score: 0.27). Training the network on multiple disturbance levels improved generalisability (mean F1‐score on unseen disturbance levels: 0.41). Cropping images to bounding boxes improved overall performance (F1‐score: 0.77 vs. 0.47) and generalisability (mean F1‐score on unseen disturbance levels: 0.73), but at a cost of losing images that contained animals which the detector failed to detect.</jats:p></jats:list-item>

jats:list-item jats:pThese results suggest researchers should consider using an object detector before passing images to a classifier, and an improvement in classification might be seen if labelled images from other studies are added to their training data. Composition of training data was shown to be influential, but including rarer classes did not compromise performance on common classes, providing support for the inclusion of rare species to inform conservation efforts. These findings have important implications for use of these methods for long‐term monitoring of habitats undergoing change, as they highlight the potential for misclassifications due to poor generalisability to impact subsequent ecological analyses. These methods therefore need to be considered as dynamic, in that changes to the study site would need to be reflected in the updated training of the network.</jats:p></jats:list-item> </jats:list> </jats:p>

Description

Funder: AXA Research Fund; Id: http://dx.doi.org/10.13039/501100001961

Funder: Sime Darby Plantation Bhd; Id: http://dx.doi.org/10.13039/501100009548

Keywords

camera trap, convolutional neural network, deep learning, disturbance, generalisability, image classification, object detection

Journal Title

Methods in Ecology and Evolution

Journal ISSN

2041-210X
2041-210X

Publisher

Wiley

Publisher DOI

https://doi.org/10.1111/2041-210X.14031

Rights

Attribution 4.0 International

Sponsorship

Natural Environment Research Council (NE/P012345/1)

Collections

Jisc Publications Router