Protected areas have a mixed impact on waterbirds, but management helps

International policy is focused on increasing the proportion of the Earth’s surface that is protected for nature1,2. Although studies show that protected areas prevent habitat loss3–6, there is a lack of evidence for their effect on species’ populations: existing studies are at local scale or use simple designs that lack appropriate controls7–13. Here we explore how 1,506 protected areas have affected the trajectories of 27,055 waterbird populations across the globe using a robust before–after control–intervention study design, which compares protected and unprotected populations in the years before and after protection. We show that the simpler study designs typically used to assess protected area effectiveness (before–after or control–intervention) incorrectly estimate effects for 37–50% of populations—for instance misclassifying positively impacted populations as negatively impacted, and vice versa. Using our robust study design, we find that protected areas have a mixed impact on waterbirds, with a strong signal that areas managed for waterbirds or their habitat are more likely to benefit populations, and a weak signal that larger areas are more beneficial than smaller ones. Calls to conserve 30% of the Earth’s surface by 2030 are gathering pace14, but we show that protection alone does not guarantee good biodiversity outcomes. As countries gather to agree the new Global Biodiversity Framework, targets must focus on creating and supporting well-managed protected and conserved areas that measurably benefit populations. Using a combined before–after control–impact approach shows that existing studies using either before–after or control–intervention methods incorrectly estimate the effectiveness of protected areas in maintaining waterbird populations.

Protected areas have been the cornerstone of conservation practice for more than a century. Nearly 16% of land and 7% of the ocean are now designated as protected areas 15 , and there are prominent calls for the Convention on Biological Diversity to set an area-based target of 30% coverage for protected areas and other effective area-based conservation measures by 2030 2 . Given the importance to humanity of addressing biodiversity loss 16 , it is crucial that the next decade's biodiversity conservation targets are informed by evidence of the most effective conservation strategies and actions 3, 17 .
Optimizing where protected areas are placed to most efficiently conserve species and their habitats has been a major research theme in conservation science for decades 18 . However, until recently, robust attempts (those making an explicit effort to account for confounding factors) to evaluate the performance of protected areas have been lacking 19,20 . A number of studies have shown that protected areas slow habitat loss, particularly in forests 3-6 , however intact habitat does not guarantee the health of populations 21 . Studies attempting to address this problem by quantifying the impact of protected areas on population health and persistence have suffered from a lack of suitable controls 19 . To accurately estimate the impact of a protected area, it is necessary to understand what would have happened in the absence of protection 22 and most studies do this by using either before-after or control-intervention study designs. Before-after studies compare populations pre-and post-protected area designation 7,13 , but cannot ascertain whether the observed difference was caused by the protected area or other factors that changed in the same time period. Controlintervention studies compare populations between protected and unprotected sites [8][9][10][11][12] , but cannot ascertain whether the observed difference was due to the effectiveness of the protected area, or because it was placed where populations were performing well to begin with.
Combining these designs into a before-after control-intervention (BACI) framework-where populations in protected and unprotected sites are compared before and after the date of protected area designation-can overcome these limitations 23 , and even approximate causality 24 . The recent emergence of large biodiversity databases in ecology provides an opportunity to test the effects of protected areas on populations under a BACI framework.
Here, using one of the largest global datasets of bird population counts, compiled from citizen science initiatives and non-governmental organization (NGO) and government-led monitoring programmes in 68 Article countries, we present the first robust, global-scale assessment of protected area impact on populations. We examined how 1,506 protected areas have impacted the population trajectories of 27,055 waterbird populations, where 'population' is defined as a particular species at a particular site (Fig. 1). Waterbirds are an appropriate taxonomic group with which to explore impact, given their broad distribution and ability to respond rapidly to changes in site quality 25 . We asked three questions: (1) how much do the study designs typically used to assess protected area effectiveness cause misleading conclusions, compared with a BACI study design?; (2) what is the impact of protected areas on waterbird populations?; and (3) what factors contribute to protected area impact?
We estimated impact using before-after, control-intervention and BACI study designs. For BACI and control-intervention analyses, we matched protected populations to similar unprotected populations using a combination of exact matching and Mahalanobis distance matching (see Methods). We considered the wide range of ways in which populations may respond to protection by counting cases where local immigrations or extinctions had occurred and using generalized linear models to assess both immediate changes in population numbers and longer-term changes in population trend (an extension of the traditional BACI study design that considers only average change in population size 24 ). We used these measures to classify populations into three broad groups: positive, negative, or no impact from protection (the full range of population responses and how they were classified are described in Fig. 3, Extended Data Figs. 3, 4).
To explore the sensitivity of our results to different parameter decisions (such as years of sampling required, the maximum geographical distance between sites, or the strictness of Mahalanobis matching), we ran our entire analysis 21 times: one 'focal analysis' using our best guess parameter estimates, and 20 analyses using estimates sampled from a plausible range for each parameter (full parameter analyses) (Methods, Extended Data Table 1).

Before-after and control-intervention estimates
Estimates of protected area effectiveness varied markedly on the basis of study design, and studies using before-after or control-intervention designs can lead to highly misleading conclusions. In our focal analysis, 37% of populations analysed using a before-after design, and 50% of populations analysed using a control-intervention design, had different outcomes to those in the BACI analysis (Fig. 2). These changes were not simply a result of BACI detecting positive or negative signals where other designs could not: 41% (before-after) and 57% (control-intervention) of populations that were apparently positively impacted were shown to be not impacted, or even negatively impacted under a BACI analysis (Fig. 2). Changes to negative impacts were even more substantial, with 63% (before-after) and 92% (control-intervention) of apparently negatively impacted populations shown to be not impacted or positively impacted by protection under a BACI analysis (Fig. 2). The findings from our full parameter analyses were similar (Extended Data Fig. 1). Before-after models were also heavily impacted by regression to the mean (Supplementary Information 5), an additional reason to consider them unreliable. These results show that relying on before-after or control-intervention studies can distort the picture of a protected area's impact.

BACI estimates of protected area impact
We found a mixed impact of protected areas on populations when using a BACI approach. Within nearly all sites, populations showed a range of responses from positive to negative (in the focal analysis the proportion of positively impacted populations within a site was 0.25 ± 0.21 (mean ± s.d.) with a range of values from 0 to 1; Fig. 3a). Impacts on populations were similarly variable when grouped by species (in the focal analysis the proportion of positively impacted populations within a species was 0.36 ± 0.17 with a range from 0 to 1; Fig. 3b). In our focal analysis, 27% of all populations were positively impacted by protected areas (blues), 21% were negatively impacted (reds) and for 48% we could detect no impact of protection (greys, white and yellows) (our full parameter analyses produced similar results; Extended Data Fig. 2). Four per cent of populations were excluded because of model failure. Of the 48% of cases where we could not detect any difference between protected and unprotected populations, 85% (41% of all populations; whites and greys, Fig. 3) were increasing, or had no trend. These cases are difficult to define as a success or failure as, while the protected area did not have a demonstrably positive impact compared to an unprotected area, the protected population appears to be healthy.
Regardless, over a quarter of populations showed a negative response (Fig. 3). These are formed from two groups: (1) negatively impacted populations, that is, those that perform worse in protected areas relative to matched controls (21%, reds) and (2)   The change in protected area impact when estimated under a before-after (BA) versus BACI framework (a) or a control-intervention (CI) versus BACI framework (b). y-axes show the proportion of populations in each category under before-after or control-intervention on the left, and BACI on the right. The shift of each colour shows how our estimate of the impact of protected areas on populations change between study designs. Note that these figures only contain populations where we could obtain both before-after and BACI (n = 6,006) or control-intervention and BACI (n = 3,609) estimates of protected area effectiveness. This figure is based on our focal analysis, Extended Data Fig. 1 shows changes in outcome across all full parameter analyses.
positive or negative signal of protection and which were either declining in protected areas at a similar rate to unprotected populations, or where both protected and unprotected populations went locally extinct (7%, yellows). Of note, half of these negative responses (14% of populations overall), do not occur in sites designated for waterbirds or their habitat (that is, Ramsar Sites 26 or Special Protected Area-Birds Directive 27 sites) and so we might not necessarily expect a positive impact in these cases and thus should not consider these to be cases where protected areas have not worked.
We consider protected area impact exclusively in the context of how protected areas support the persistence of populations, which ignores the potential benefit of protection on the maintenance of the habitats in which these populations occur. Our dataset was restricted to sites where monitoring occurred: if habitat change meant that waterbirds were no longer found at a site, monitoring would likely cease 28 . Thus, we could not consider such sites as counterfactuals, and so could not account for protected areas having prevented complete habitat conversion. We also do not consider the potential for protected areas to defend against future threats, for instance, protecting a future climate refuge. In sum, it is important to remember that the results presented here about the impact of protected areas on populations are above and beyond these already-known benefits 3-5,29,30 .
Our results are also likely to underestimate the positive impact of protection as we were restricted to species for which we were able to obtain adequate matches between protected and unprotected populations, resulting in a bias towards common species (Supplementary Information 10). Common species tend to have more generalist habitat requirements 31 and so may fare better in degraded sites than rarer species. They are also less likely to be the target of specific interventions, which in some cases could actively impede them; for instance, water could be kept at levels appropriate for rare waders, but not for common ducks. To explore whether this affected our results, we assessed whether outcomes varied between regionally threatened and non-threatened species in Europe (Supplementary Information 11; a global analysis was not possible due to data restrictions). We did not find any differences in the impact of protected areas between these groups, possibly because there was only a small set of threatened species in our data, though a recent study 32 similarly found no difference between rare and common species when studying population trends.

Predictors of protected area impact
We show that the mere designation of a protected area does not necessarily bring benefits to populations. Given this, we used cumulative link mixed models, where the response variable was the impact (positive, no or negative), to investigate which species and protected area characteristics predict outcomes for populations, on the basis of our BACI framework (see Fig. 4  Article country, site, species and spatial grid cell. Our explanatory variables included a management variable, which broadly categorized sites as either 'waterbird-managed' (Ramsar or Special Protected Area-Birds Directive sites), or 'mixed-management' (sites either not designated for waterbirds or their habitat, or of unknown management status). Management for waterbirds was consistently positively correlated with protected area success (Fig. 4). Larger protected areas were also almost always positively correlated with success, although significantly so in only a few analyses (Fig. 4). No other site or species-based characteristic was consistently positively or negatively associated with success ( Fig. 4; Extended Data Fig. 5). Depending on the analysis, a large, waterbird-managed area could increase the likelihood of a positive impact on a population anywhere from 1 to 25 percentage points (mean weighted by model confidence = 9 percentage points; see Supplementary Information 13) compared to a small, mixed-management area.
These values are likely to underestimate the positive impact of management. Our classification of sites into waterbird-managed sites and mixed-management sites is a simple metric of diverse on-the-ground practices (a more nuanced classification was not possible at the global scale) and, inevitably, some mixed-management sites are likely to be managed for waterbirds, while management quality will vary within waterbird-managed sites 33,34 . Both these factors would reduce the observed difference between the two management classifications, meaning the true difference is likely higher. That waterbird-managed sites perform better emphasizes the need for effective management to avoid negative outcomes, and suggests that policy needs to focus on setting and adhering to ambitious management targets.
The weak positive association between protected area size and impact adds a new element to the 'single large or several small' protected area debate that considers which is better for conserving biodiversity. Studies have agreed that several smaller protected areas typically provide higher species richness than a few large areas 35 , but that larger areas are critical for persistence of larger species 36 . Our results demonstrate some importance of larger protected areas for supporting populations of waterbirds through time. This is concerning given many protected areas across the world are small and many are currently being downsized 37 .
While our analysis included data from 68 countries across 6 continents, the data are biased towards Europe, North America and East Asia; a common problem in large-scale ecological studies 38 . There are a number of initiatives in less-studied areas of the world to increase the supply and quality of ecological data 39 (for example, https://africanconservation.org, https://www.avesargentinas.org.ar, https://www. birdscaribbean.org, and https://www.amazonteam.org/brazil/); supporting and incorporating efforts such as these will be vital to informing truly global evaluations of conservation effectiveness.
Our results show a mixed impact of protected areas, supporting concerns raised over protected area efficacy in recent years 40,41 . We had expected that, given their ability to move between sites 25 , waterbirds would show a more immediate and positive signal of protection than other non-mobile taxa, such as reptiles, where positive signals might not be apparent until multiple generation cycles of improved breeding rate had occurred. The lack of signal could be due to poor or limited management of many protected areas, or it could be due to forces that cannot be controlled within the borders of a protected area. Waterbirds rely on water, and threats such as pollution, upstream dam installation and sea level rise cannot be managed by a protected area, and can have devastating consequences [42][43][44] . Terrestrial taxa will be less impacted by such threats and therefore may experience more positive responses to protection 45 , although beyond border threats are not limited to those affecting water: climate change, air pollution and disease have the potential to impact all species 45 . Finding solutions to conserving species in the face of these more ubiquitous threats is a key conservation challenge.

Conclusions
The parties to the UN Convention on Biological Diversity will soon decide on the post-2020 Global Biodiversity Framework, which will set nature conservation policy for the decade ahead. It is likely to include a commitment to protect and conserve 30% of Earth protected by 2030 (and there are growing calls for this to reach 50% by 2050 14 ). Researchers have warned that such calls must consider the social and political context in which conservation operates, or risk undermining conservation support 46 . Our results raise additional concerns about the '30 by 30' approach by showing that protection alone does not guarantee optimal biodiversity outcomes. Halting biodiversity loss requires improvements to the performance of existing protected areas, and action to address ubiquitous threats beyond area borders. Ever-increasing area-based targets must be accompanied by equally ambitious targets that ensure protected area effectiveness.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-04617-0.  Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Methods
We published a pre-analysis plan for this paper laying out our planned analysis before we looked in detail at the data 47 . Pre-analysis plans are useful to reduce the risk of cherry picking or hypothesizing after results are known, which has led to a replication crisis in science 48 . As much as possible, we have followed the methods we set out, however we discovered a number of factors we had not considered (for instance, the potential for immigrations and extinctions and the fact that both trend and immediate change must be considered 24 ). The conceptual basis of our revised methodology is described in detail elsewhere 24 , and Supplementary Information 7 describes the choices we have made that deviate from the pre-analysis plan and why.

Overview
A brief summary of our workflow is as follows: we took yearly counts of 749 waterbird species at 45,745 sites across the world from the International Water Census and Christmas Bird Count. Of these, we wanted to find populations, here defined as a certain species at a certain site, that occurred in a protected area and where yearly counts had begun before the protected area was designated. For our before-after (BA) analysis, we then assessed how each population at each of those sites changed from before to after the protected area was designated. For control-intervention (hereafter CI) and BACI analysis, we matched each of these protected populations to unprotected populations surveyed over the same period, that were similar based on a number of site and species characteristics. For CI, we compared populations in the years after the protected area was designated between unprotected and protected population pairs. For BACI, we compared change in protected populations from before to after protected area designation, and then compared this to the BA change in matched unprotected populations over the same period. Whether BA, CI or BACI, we then classified the impact to the population as positive, negative or no impact from protection. Next, we looked to see whether our conclusions about impact varied when we analysed a population in a BA, CI or BACI framework. We found BA and CI analyses to be unreliable, so discarded them at this point. Next, we looked to see whether there were correlates that predicted protected area impact, by running cumulative link models on BACI data. These correlated outcome (Positive, Negative or No impact) to a range of site and species level predictors such as protected area size, species body size, land use type and whether the site was managed for waterbirds. Finally, we ran sensitivity tests varying a range of parameters that were used to make analytical decisions to test the robustness of conclusions.
All analysis was completed using R v4.0.3 49 and QGIS v3.10 50 , data figures and base maps were produced using the R package ggplot2 51 , impact legends were produced using Inkscape.

Time-series preparation
We took site-specific annual counts from two long term surveys: the International Waterbird Census (IWC), coordinated by Wetlands International, and the Christmas Bird Count (CBC), run by the National Audubon Society. We used Wetland International's definition of waterbird, and took any species from the corresponding families (list of families in Supplementary Information 2). Our initial dataset consisted of 749 species at 45,475 sites, spanning 1940 to 2018. We then restricted our data to only sites surveyed in December to February. We imputed zeroes, by taking any site where a species has been observed, and recording any years where the species was not mentioned as '0' years.
As CBC data is not standardized for effort, we required that these species showed a log-linear relationship with effort (that is, the rate of new individuals detected in a search slows with increased effort), in order to be able to include effort as a term in our models. For each species, we ran a simple negative binomial generalized linear model in R, using the glm.nb function from package MASS 52 , using all available CBC data for that species: Where Count is all counts of a species and h i is the number of survey hours for each count. We retained CBC data for all species where there was a significant positive relationship between count and effort.

Protected and unprotected area data
We first created a dataset of counts at protected sites. We took our protected area data from the World Database on Protected Areas ('Protected Planet') 53 , downloading the full dataset of all protected areas globally, and overlaying our sites to determine which fell in protected areas. Some coastal site coordinates fell just outside the land cover layer that protected areas are aligned to, so we snapped all sites to the base terrestrial layer used by Protected Planet 54 , but by no more than 10 km. We removed any sites where the designation status was proposed, and any United Nations Educational, Scientific and Cultural Organization (UNESCO) biosphere reserves as these are often not afforded formal protection 55 . We next removed any sites where there was no information about designation date. In some cases, there were multiple protected area data entries for a site, in these cases we took the earliest designation year given. Finally, we reduced the count dataset to only the 10 years before and after the designation date of whichever protected area the survey site fell within, requiring that at least 7 years before and after were surveyed (we tested the number of years restricted from 5-15 years, and number of years measured from 4-13; Extended Data Table 1a, b, respectively). We next created a dataset of counts at unprotected sites for CI and BACI analysis. For Christmas Bird Count data, surveying is conducted in a circle with a radius of 12.07 km. If there is a protected site in this circle, we cannot be sure that the counts are not being biased by protection. Therefore, we only counted sites as unprotected if no protected area occurred in the entire circle. For IWC data, we included sites that were at least 1 km from a protected area, to avoid any confounding of results from spill-over effects 56 (we sensitivity tested this threshold from 500 m to 5 km; Extended Data Table 1c). We consider sites to be unprotected until the point in time when a protected area was designated at that site. For instance, a site, A, could be designated as a protected area in the year 2000, but this would mean that counts before this point, say, from the 1980s, would be of waterbirds at a site not experiencing any benefit of protection. We could therefore match a protected site from the 1980s to Site A's counts in the 1980s, and treat A's counts as unprotected at this time.

BA, CI and BACI datasets
In all cases, we defined the 'after' period as being the years after, but not including, the designation date of the protected area. We also defined cases of 'all zeros' to account for local immigrations and extinctions. Waterbirds are highly mobile and can quickly immigrate to, or emigrate from a site. In these cases we cannot assess a change in trend between, for instance, a before period where there are individuals absent and an after period when they have immigrated to the site (for a detailed explanation of why immigrations and extinctions pose a problem for trend analysis, see ref. 24 ). Theoretically, we should only consider those cases where there are zero counts in all before or after years as 'all zero' local immigrations or extinctions, but because waterbirds are able to appear as vagrants at a site, we chose to classify cases where at least 70% of years were zero counts as all zeroes. We tested this threshold from 60-80% (Extended Data Table 1d). Of note, any sites where the species had never occurred would not be included in the dataset, so even in cases of all zeroes the species is known to be able to occur at the site.
To create the BA dataset, we took all protected populations where there were cases of counts (as opposed to all zeroes) in either the before period, after period or both. We subset the BA dataset to only protected populations that also occur in the BACI dataset.
To create the CI dataset, we took all protected populations with counts (as opposed to all zeroes) in the after period, and matched these to unprotected populations also with counts over the same time period (see matching below). We subset the CI dataset to only protected populations that also occur in the BACI dataset.
To create the BACI dataset, we matched protected and unprotected populations, requiring that at least one period (either protected before, protected after, unprotected before, or unprotected after) had counts (as opposed to all zeroes).

Matching
Data preparation. We developed a statistical matching method to achieve matching of BACI and CI analyses. The covariates we used for matching, how we prepared them and justification for their use are given in Extended Data Table 2, broadly they encompass variables related to climate, land use and human impact. We removed highly correlated variables by first calculating the variance inflation factor (using the VIF function from the usdm package in R 57 ) of all covariates, and iteratively removing variables with a VIF greater than four until none were over four 58 . We next removed variables with a Pearson's correlation coefficient of over 0.7.
For BACI, we matched only on covariates in the years prior to designation (as protected and unprotected sites might be expected to differ in the years after protected area designation, especially on covariates related to human impact). For CI, we matched on covariates only in the years after designation, as we choose to be blind to the 'before' period in this analysis.
We then proceeded with matching, separately for each species. The following describes the procedure for one species.
Mahalanobis distances. We used Mahalanobis distance matching to evaluate how similar protected and unprotected sites were. Though Mahalanobis distance has been criticized in the past for performing poorly when matching on many covariates 59,60 , recent criticisms of the most commonly used matching method, Propensity Score Matching 61 , meant we were interested to test other options and found Mahalanobis distance matching to perform markedly better in comparisons (Supplementary Information 9).
Mahalanobis Distance (md) computes the distance between points in multivariate space. The Mahalanobis distance between two sets of points is calculated as follows: x y ( , ) T −1 x y x y Where x and y are vectors containing values for each covariate (in our case, therefore, the list of covariate values for sites x and y) and S is the covariance matrix of the covariates.
This formula requires each site to have one value for each covariate, so we took means of the values for the years pre-(BACI) or post-(CI) designation.
For each species, we created a large matrix with protected sites in columns and unprotected sites in rows, with Mahalanobis distance values populating the rows. Because we wanted to match exactly on the years only prior to protected area designation, we first created separate matrices (using function mahal from R package DOS 62 ), each containing only protected areas designated in a certain year (see Extended Data Fig. 6a, b for an example). Mahalanobis distance requires at least two protected sites to work (to be able to calculate the covariance matrix), and so we could not build Mahalanobis distance matrices for years where only one protected area in our dataset was designated. This resulted in a minimal loss of sites.
These Mahalanobis distance matrices were then combined into the larger distance matrix containing all the sites across all designation years that the species occurred in (Extended Data Fig. 6c).
Exact matching. We required that sites were exactly matched on a number of criteria; where sites failed they were excluded from the Mahalanobis distance matrix (Extended Data Fig. 6d). For each protected site, we removed unprotected sites not of the same anthrome category, continent, and migratory status. We also removed any sites greater than 500 km from the protected area (we tested this value from 100 km to 2,500 km; Extended Data Table 1e).
For BACI analysis, we needed to satisfy the parallel trends assumption 24,63 , which specifies that the trends of control and intervention populations in the 'before' period must be parallel. To test this, we modelled the difference in trends between each protected and potential unprotected matched population. We used a negative binomial glm (glm.nb, R package MASS 52 ),: Where the count of the population in year i at site j is predicted by the Year (Y), a binary term that is one for the protected site and zero for the unprotected site (CI) and the interaction between the two. Log of effort is included as an offset for CBC data (effort is held at one for IWC data). We also checked for temporal autocorrelation and adjusted the model if it was present (see 'Temporal autocorrelation' below). If the interaction coefficient (β 3 ) was significant (P < 0.05), then there was a significant difference between the trend of the two populations, and the unprotected population was discarded.
If no unprotected sites met the exact match criteria, the protected site did not have a match and was excluded (for example, Extended Data Fig. 6d, site E).
Picking matches. Next, we ran an optimized greedy nearest-neighbour algorithm to select, from the Mahalanobis distance matrix (with any sites not satisfying exact match criteria excluded), the unprotected site with the smallest Mahalanobis distance. We ran this without replacement, meaning each protected site could be matched to only one unprotected site, to ensure no pseudoreplication. A greedy algorithm works through the dataset, picking the best match for each successive protected site and removing the matched unprotected site from the potential matching pool as it goes. However, greedy algorithms have a tendency to get stuck in local optima 64 , so to account for this, we ran the greedy algorithm 1,000 times, each time randomizing the order of protected sites that the greedy algorithm would work through. We found the global distance for each iteration and used the set with the smallest global distance (Extended Data Fig. 6e, for example, with randomizations in the figure a smaller global distance would be detected).
Evaluating match quality. Once we had our matched sets for each species, we needed to ensure that the matches were of a high enough quality to be used. This was done by assessing the covariate balance between matched and unmatched sites for each species using the 'standardized difference in means' (SDiM), which is calculated using the following formula 65 :

T C T C
Where T cov is the values of covariate cov for protected sites (mean from the years before and equal to designation), C cov is the same for unprotected sites, var is the variance of each of these and d cov is the standardized mean difference between protected and unprotected sites. We assessed the SDiMs to determine whether they were below 0.25 for all covariates 60,66 (we sensitivity tested this threshold from 0.1 to 0.25; Extended Data Table 1f). If they were not, the matched pair with the greatest distance was removed and the SDiM checked again. Once all covariates had a SDiM of <0.25 (or the relevant sensitivity value), the remaining matched pairs were considered the 'final' matched dataset for that species (Extended Data Fig. 6f). If fewer than 80% of the sites that a species occurred in were remaining, we discarded the species, to ensure that the matched set was not biased to a certain subset of all sites for that species (we sensitivity tested this value from 50-90%; Extended Data Table 1g).

Assessing protected area impact
Following the framework set out in ref. 24 , we defined a number of ways that a population could respond to protection. Broadly, populations can respond to a protected area by immigrating to the area, going locally extinct from the area, showing a change in trend over time, or by showing an immediate change, that is, an immediate increase or decrease in the number of individuals (see legends of Fig. 3, Extended Data Figs. 3, 4).
For comparing BA, a population could show an immediate change or change in trend, or the population could immigrate to the site or go locally extinct at the site (Extended Data Fig. 3). For comparing BACI, the BA changes were compared between protected and unprotected sites. For example, a population could be stable in the period before protection, and declining in the period after -this would be a negative BA trend change (Extended Data Fig. 3). But if a matched unprotected population was also stable in the before period, but declining at a faster rate in the after period, then the BACI trend change would be positive (Fig. 3), as the protected area had slowed the decline of the protected species, even if it hadn't halted it. If the unprotected population was declining at a similar rate to the protected population in the after period, this would be a case of no impact under a BACI framework (Fig. 3). For comparing CI, only the difference in trend between protected and unprotected populations was considered (Extended Data Fig. 4).
All BA, CI or BACI time periods with all zeroes were categorized as immigrations or extinctions, for instance, in BACI analysis, if a protected population had no counts in the before period, but did in the after period, while the matched unprotected population had no counts in the before and after period, this would be classified as a local immigration (and a positive impact of the protected area).
For time periods with all counts, we ran the following models. In all cases Y represents the year, centred around the year of protected area designation so that year of designation equals zero. BA is a binary term that is zero in the years before protected area designation, and one in the years after; note that this isn't included in the CI model as only 'after' years are used. CI is a binary term that is zero for the unprotected population and one for the protected population; note that the CI term is not included in the BA model as this model does not include unprotected populations. Finally, each model includes an offset term for effort (h), to account for variable effort in CBC data. For IWC data, effort is always set to one and so does not contribute to the model. All models were negative binomial glms, run using R package MASS 52 ; see ref. 24 for a more detailed explanation of these models.
For BA: β 1 gives the immediate change and β 3 gives the trend change 24 . For CI: β 3 gives the difference in trend between protected and unprotected sites.
For BACI: Β 4 gives the immediate change and β 7 gives the trend change 24 . We excluded any cases where β 6 was significant as this indicates a significant difference between protected and unprotected trends in the before period, meaning the parallel trends assumption is not satisfied. Though we checked for this in matching, running a full model containing 'after' data (compared with only 'before' data, as in matching) meant that very occasionally this term became significant, presumably because of an increase in power.
In a small proportion of populations, models failed to converge. In these cases, we removed the population from analysis.
Temporal autocorrelation. Time-series data are vulnerable to the effects of temporal autocorrelation, where counts in one year are impacted by counts in the years before, and as a result are not independent, as models assume. Being mobile, we expect less temporal autocorrelation in waterbird data than for sessile species (waterbird population numbers can change markedly at a site year to year), but nevertheless we checked for, and accounted for, temporal autocorrelation in our data. For each population model (whether BA, CI or BACI; and also for the models used to check for parallel trends in the matching stage), we checked for temporal autocorrelation using three implementations of the Durbin-Watson test in R: durbinWatsontest from package car 67 , testTemporalAutocorrelation from package DHARMa 68 , and dwtest from package lmtest 69 . Though each of these implementations performs the same test, variations in methodology meant we found some population models had significant temporal autocorrelation under one, but not another. To be conservative, we decided that if a population had significant autocorrelation under any of the three tests, we considered there to be temporal autocorrelation. If this was the case, we re-ran the population model as a negative binomial generalized linear mixed model (using glmer.nb from package lme4 70 ) including a random intercept for year for BA analyses, and site:year for CI and BACI analyses, to account for the autocorrelation.
Classifying outcomes. We then classified outcomes. We aimed to be generous for assigned positive outcomes, and so for BA and BACI, a significantly (P < 0.05) positive immediate or trend change (even if the other was significantly negative) meant that the protected area was classed as having had a positive impact on the population. If both immediate and trend were insignificant, then the protected area was classed as having had no impact. And if either was negative and the other insignificant, or if both were significantly negative, the protected area was classed as having had a negative impact. We conducted a supplementary analysis to see whether relaxing this P-value would result in detecting more positive impacts (see Supplementary Information 12), results did not affect our conclusions.
For CI, a significantly positive difference between protected and unprotected trends was classed as a positive impact, significantly negative was a negative impact, and an insignificant difference no impact.

Drivers of change
To explore the predictors of protected area effectiveness, we considered body mass, species migratory status, taxonomic order, the broad anthrome category (i.e. land use type) of the protected area, protected area size, and country governance 11 . See Extended Data Table 3 for details of how each covariate was obtained.
To test how these covariates might correlate to protected area effectiveness, we ran cumulative link mixed effects models that allow for ordinal predictors and random factors, with the response term being a three-level factor: negative impact, no impact, or positive impact. To account for spatial autocorrelation, we included a random intercept for 'grid cell', with sites each assigned to a gridcell of size 2 × the maximum distance between protected and unprotected sites, depending on the focal/full parameter analysis (Extended Data Table 1e). In this way errors are grouped by sites that are closer together. In some of the 21 analyses, typically those with smaller sample sizes, including both country and grid cell as random factors meant the model could not converge; in these cases we retained only country as a random factor. We used the clmm function from R package ordinal 71  ∼ Where i, j, k and m are species, site, country and gridcell, respectively. In some sensitivity tests some covariates did not have sufficient populations to be able to test them, in these cases certain levels of the covariate were removed (for example, if there were not enough populations of a particular taxonomic order) or in some cases the entire covariate was removed. Not all protected areas have area data reported, and so we had to run models only on the subset of data where area data was available. To ensure this reduced set was not misrepresenting the full dataset, we also ran models without the 'protected area size' covariate and on the full dataset; results were broadly similar ( Supplementary  Information 8), and in the case of BACI, waterbird-managed sites were more strongly positively associated with outcomes.
We estimated the effect size of management and protected area size using the function ggpredict from R package ggeffects 72 , which returns odds ratios from the cumulative link mixed models. We estimated effect size for water-bird managed vs mixed-managed sites, and for 5 quintiles of log(protected area size): 0.05, 0.25, 0.5, 0.75 and 0.95. For the effect size reported in the manuscript, we compared the chance of a positive impact on a population in a mixed-management site in the 0.05th size quintile to the chance of a positive impact on a population in a waterbird-managed site in the 0.95th quintile.
Finally, some covariates violated the proportional odds ratio assumption upon which cumulative link models rest. To check for the impact of this we ran individual binomial generalized linear mixed-effects models (using function glmer from R package lme4 70 ) to conduct pairwise comparisons of outcome levels. These models supported the general conclusions made in this paper (see Supplementary Information 13 for further details).

Full parameter analyses
The focal analysis inevitably is based on somewhat arbitrary modelling choices. We therefore ran our models an additional 20 times with a range of parameter values for decisions such as: the number of years of counts required before and after protection, the threshold at which we classify 'all zeroes', the maximum distance between protected and unprotected sites for an acceptable match and how similar we required matched sites to be (Extended Data Table 1). Testing all parameter combinations was computationally impractical so we used a latin hypercube sampling method 73 . This is a way to adequately sample a high dimensional parameter space when random sampling is prohibitively inefficient; it creates multiple combinations of covariates that together evenly sample the entire n dimensional sample space. We randomly created 20 parameter combinations (using function randomLHS from the R package 'lhs' 74 ), which are displayed in Extended Data Table 1. We call these analyses our 'full parameter' analyses (www.govindicators.org).

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The waterbird count data used in this study are collated and managed by Wetlands International and the National Audubon Society, and are available on request (http://iwc.wetlands.org/index.php/ and http:// netapp.audubon.org/cbcobservation/, respectively). We requested all data from both providers for the years 1900-2018, for all waterbird families (see Supplementary Information 2), and for sites in all available countries (though data from Russia was excluded as permissions were not given). All the data that pertain to explanatory variables are freely available, as specified in Extended Data Tables 2, 3.

Code availability
The code used to produce all analysis and figures are archived on Zenodo at https://doi.org/10.5281/zenodo.5794511. Code are also available on GitHub at https://github.com/hannahwauchope/PAImpact; this is the recommended mode of access as it will contain any updates or clarifications. Fig. 4 | Estimates of protected area impact under a CI study design. Proportion of populations (n = 3783) showing various responses to protection, per site (a; n = 698) and per species (b; n = 32), when response to protection is calculated in a CI framework. Each species/site is one bar, with the proportion of their populations in each category shown on the y axis. Bar width is scaled to the number of populations of that species/site in the dataset, log scaled in the case of species, with a wider bar meaning the species/site has more populations. Each colour represents a different way a population can respond to protection, and an example of each is shown at the bottom. This figure is based on our focal analysis; Extended Data Fig. 2c shows the proportion of populations within each broad outcome category across all full parameter analyses. Fig. 5 | Predictors of protected area impact, with odds ratios and confidence intervals. Odds ratios for covariates predicting protected area (PA) effectiveness under a BACI framework. Estimated using cumulative link mixed models, points show model estimates, tails show 95% confidence intervals, and significance is indicated by bold colours (P < 0.05). Dashed line given at an odds ratio of one (ratios above one indicate a positive relationship, and below one a negative relationship). Y axis shows all analyses (20 full parameter analyses, plus one focal analysis, with the focal analysis given in the first row). Colours show covariate grouping. Orders are measured relative to Anseriformes, and Anthromes relative to Urban. Note that we expect continuous variables (PA Area, Body Size, Governance) to have smaller coefficients as they express odds ratios per unit increment.