Monitoring whales in remote areas is important for their conservation; however, using traditional survey platforms (boat and plane) in such regions is logistically difficult. The use of very high-resolution satellite imagery to survey whales, particularly in remote locations, is gaining interest and momentum. However, the development of this emerging technology relies on accurate automated systems to detect whales, which are currently lacking. Such detection systems require access to an open source library containing examples of whales annotated in satellite images to train and test automatic detection systems. Here we present a dataset of 633 annotated whale objects, created by surveying 6,300 km2 of satellite imagery captured by various very high-resolution satellites (i.e. WorldView-3, WorldView-2, GeoEye-1 and Quickbird-2) in various regions across the globe (e.g. Argentina, New Zealand, South Africa, United States, Mexico). The dataset covers four different species: southern right whale (
Measurement(s) Whale detections in very high-resolution satellite imagery Technology Type(s) very high-resolution satellites and GIS software Sample Characteristic - Organism Megaptera novaeangliae • Balaenoptera physalus • Eschrichtius robustus • Eubalaena australis Sample Characteristic - Location Maui, United States • Peninsula Valdes, Argentina • Pelagos Sanctuary • Auckland Islands, New Zealand • Laguna San Ignacio, Mexico• Witsand, South Africa
Very high-resolution (VHR) satellite imagery allows us to survey regularly remote and large areas of the ocean, difficult to access by boats or planes. The interest in using VHR satellite imagery for the study of great whales (including sperm whales and baleen whales) has grown in the past years
Detecting whales in the imagery is either conducted manually
In machine learning an algorithm learns how to identify features by repeatedly testing different search parameters against a training dataset
Creating a large enough dataset necessary to train algorithms to detect whales in VHR satellite imagery will require the various research groups analysing VHR satellite imagery to openly share examples of whales and non-whale objects in VHR satellite imagery, which could be facilitated by uploading such data on a central open source repository, similar to the GenBank
Here we present a database of whale objects found in VHR satellite imagery. It represents four different species of whales (i.e. southern right whale, Database of annotated whales detected in satellite imagery covering different species and areas. Humpback whales were detected in Maui Nui, US (
Twelve satellite images were used to build the database. They were acquired by different very high-resolution satellites owned by Maxar Technologies, formerly known as DigitalGlobe (Table Characteristics of the satellite imagery analysed for the presence of whales. Location Target Species Satellite Catalogue ID Product Type and Level Date (DD/MM/YYYY) Max Ground Sample Distance Bands Area (km2) Auckland Islands, New Zealand Southern right whale ( QuickBird-2 1010010005232700 Standard 2 A 12/08/2006 0.65 m 4xMULs PAN 70 Auckland Islands, New Zealand Southern right whale WorldView-2 103001000D6D1000 Standard 2 A 27/08/2011 0.48 m 8xMULs PAN 70 Laguna San Ignacio, Mexico Grey whale ( WorldView-3 104001002959ED00 Standard 2 A 20/02/2017 0.39 m 8xMULs PAN 350 Maui Nui, US Humpback whale ( WorldView-3 1040010006C2B700 Standard 2 A 09/01/2015 0.36 m 8xMULs PAN 570 Pelagos, Ligurian Sea Fin whale ( WorldView-3 104001001E19F000; 104001001E7B8900; 104001001E020000; 104001001D325700 Standard 2 A 19/06/2016 19/06/2016 19/06/2016 26/06/2016 0.33 m 0.37 m 0.39 m 0.34 m 8xMULs PAN 4,230 Península Valdés, Argentina Southern right whale WorldView-2 103001001C8C0300 Standard 2 A 19/09/2012 0.56 m 4xMULs PAN 120 Península Valdés, Argentina Southern right whale WorldView-3 10400100032 A3700 Standard 2A 16/10/2014 0.37 m 8xMULs PAN 560 Península Valdés, Argentina Southern right whale WorldView-2 103001005CBC0A00 Stereo 1B 23/09/2016 0.55 m 8xMULs PAN 270 Witsand, South Africa Southern right whale GeoEye-1 1050410001D94500 Standard 2 A 09/08/2009 0.44 m 4xMULs PAN 60 MUL refers to multispectral imagery, which is composed of various colour bands (e.g. four or eight). PAN refers to panchromatic, which is always composed of one greyscale band.
Criteria to select the imagery were: 1) less 20% cloud cover, 2) calm sea state (i.e. no white caps and low swell), and 3) where it was known that only one species would be present at the time of image acquisition. The percentage of cloud coverage was assessed by the satellite imagery provider. We visually assessed the sea state for the presence of white caps and the level of swell. As it is currently unknown whether species could be differentiated in VHR satellite images, we selected well studied locations to ensure the presence in the imagery of only one great whale species.
The satellite images were manually scanned for the presence of whales using ArcGIS 10.4 ESRI 2017, following Cubaynes
Whale objects were marked with a point and were subsequently assigned a level of confidence as explained below in the “Technical Validation” section.
For each detected whale, a point was placed on it with associated metadata (see Data description). Boxes were created around each point indicating a whale object using ArcGIS 10.4 ESRI 2017, and following the workflow illustrated in Fig. Workflow presenting the various steps to create the Whales from Space database, using ArcGIS 10.4 ESRI 2017. The multispectral image is outlined by large black dashes, the panchromatic by small black dashes and the pansharpened by a full black line. Satellite images © 2022 Maxar Technologies. List of shapefiles included in the dataset that represents whale objects examples in VHR satellite imagery. File Name Description Box_Auckland2006_Whales_PS.shp Point_Auckland2006_Whales_PS.shp Southern right whale 1010010005232700 Auckland Islands, New Zealand Box_Witsand2009_Whales_PS.shp Point_Witsand2009_Whales_PS.shp Southern right whale 1050410001D94500 Witsand, South Africa Box_Auckland2011_Whales_PS.shp Point_Auckland2011_Whales_PS.shp Southern right whale 103001000D6D1000 Auckland Island, New Zealand Box_Valdes2012_Whales_PS.shp Point_Valdes2012_Whales_PS.shp Southern right whale 103001001C8C0300 Península Valdés, Argentina Box_Valdes2014_Whales_PS.shp Point_Valdes2014_Whales_PS.shp Southern right whale 10400100032A3700 Península Valdés, Argentina Box_Maui2015_Whales_PS.shp Point_Maui2015_Whales_PS.shp Humpback whale 1040010006C2B700 Maui Nui, US Box_Pelagos2016_Whales_PS.shp Point_Pelagos2016_Whales_PS.shp Fin whale 104001001E19F000; 104001001E7B8900; 104001001E020000; 104001001D325700 Pelagos Sanctuary, Ligurian Sea Box_Valdes2016_Whales_PS.shp Point_Valdes2016_Whales_PS.shp Southern right whale 103001005CBC0A00 Península Valdés, Argentina Box_Ignacio2017_Whales_PS.shp Point_Ignacio2017_Whales_PS.shp Grey whale 104001002959ED00 Laguna San Ignacio, Mexico
Image chips were created using the box created in the above section, following the workflow presented in Fig. Workflow presenting the steps to create the image chips using ArcGIS 10.4 ESRI 2017 and the pansharpened image and boxes created in Fig.
As we acquire and analyse more satellite imagery, we aim to annually update the Whales from Space dataset. The updates will be available under the Whales from Space dataset deposited on the NERC Polar Data Centre repository
The “Whales from space dataset” is available on the NERC UK Polar Data Centre repository and separated in two sub-datasets: a dataset that contains the whale annotations (box and point shapefiles with associated csv files) named “Whales from space dataset: Box and point shapefiles” Summary of the number of whale objects counted in the imagery. Location and year Definite whales Probable whales Possible whales Total number of whales Auckland 2006 6 28 35 69 Witsand 2009 71 7 11 88 Auckland 2011 1 7 26 34 Valdés 2012 15 32 37 84 Valdés 2014 23 12 24 59 Maui 2015 20 11 25 56 Pelagos 2016 26 3 5 34 Valdés 2016 32 26 71 129 Ignacio 2017 34 28 18 80 See Table Proportion of whale objects included in the database per species (top to bottom: southern right whale, humpback whale, fin whale and grey whale) and per certainty categories (“definite”, “probable”, and “possible”). The proportion is given separately for each satellite image analysed in this study (Table
The “Whales from space dataset: Image chip” comprises of the 633 annotated whale objects as image chips. To fulfil the End User Licence Agreement with Maxar Technologies
Each box and point has metadata associated to it, which is included in the attribute table associated to the specific shapefile. It contains information about the detected whale: certainty level (i.e. “definite”, “probable”, “possible”) derived from the classification score assessed based on various criteria (i.e. body length, body width, body shape, body colour, flukeprint, blow, contour, wake, after-breach, defecation, other disturbance, fluke, flipper, head callosities and mudtrail) following Cubaynes
Ground truthing, the process of verifying on the ground what is observed in a satellite image
As species differentiation has not been tested when analysing satellite images, we reference the most likely species in this database. The most likely species was assigned based on the scientific literature, hence our decision to acquire images of specific areas when only one large whale species was expected to be present
Anyone using any of the image chips is required to attribute the image chips as follow: “Satellite image © 2022 Maxar Technologies”.
All the satellite images that we have used to build the dataset were provided by Maxar Technologies (formerly DigitalGlobe). We recommend contacting Maxar Technologies national office to enquire about acquisition and cost, as pricing is conducted on a user case scenario. To ensure you acquire the same satellite images we have created the boxes for, we have provided the Catalogue ID in Table
This work was supported by an Innovation Voucher from the British Antarctic Survey and a grant from NC-International NERC (NE/T012439/1). We are thankful to Ellen Bowler for her advice on the best format of the boxes, for this database to be useful for machine learning. We are also grateful to the insightful knowledge from the teams of machine learning experts from the GAIA (Geospatial Artificial Intelligence for Animals) and the GSTS smartWhales projects, and the Cambridge Image Analysis and the AI for the study of Environmental Risk research groups from the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge, which used and confirmed the application of these datasets to machine learning.
Conceptualisation: H.C.C., P.T.F.; Methodology: H.C.C., P.T.F.; Database creation: H.C.C.; Writing: H.C.C., P.T.F.
We used ArcGIS 10.4 ESRI 2017 to analyse the satellite images and create the boxes. ArcGIS 10.6 ESRI 2017 can also be used. Various pansharpening algorithm exists
The authors declare no competing interests.