ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0002-8387-5739

Current Organisation
Massey University

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Research Topics

In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Ecosystem Assessment and Management at Regional or Larger Scales | Flora, Fauna and Biodiversity at Regional or Larger Scales | Ecosystem Adaptation to Climate Change | Expanding Knowledge in the Environmental Sciences |

Publications

Publication

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

Publisher: Cold Spring Harbor Laboratory

Date: 28-06-2018

DOI: 10.1101/357798

Abstract: When applied to structured data, conventional random cross-validation techniques can lead to underestimation of prediction error, and may result in inappropriate model selection. We present the R package blockCV , a new toolbox for cross-validation of species distribution modelling. The package can generate spatially or environmentally separated folds. It includes tools to measure spatial autocorrelation ranges in candidate covariates, providing the user with insights into the spatial structure in these data. It also offers interactive graphical capabilities for creating spatial blocks and exploring data folds. Package blockCV enables modellers to more easily implement a range of evaluation approaches. It will help the modelling community learn more about the impacts of evaluation approaches on our understanding of predictive performance of species distribution models.

Publication

Ignoring Imperfect Detection in Biological Surveys Is Dangerous: A Response to ‘Fitting and Interpreting Occupancy Models'

Publisher: Public Library of Science (PLoS)

Date: 30-07-2014

DOI: 10.1371/JOURNAL.PONE.0099571

Publication

A review of evidence about use and performance of species distribution modelling ensembles like BIOMOD

Publisher: Wiley

Date: 22-01-2019

DOI: 10.1111/DDI.12892

Publication

Efficient effort allocation in line‐transect distance sampling of high‐density species: When to walk further, measure less‐often and gain precision

Publisher: Wiley

Date: 02-04-2021

DOI: 10.1111/2041-210X.13589

Abstract: Line‐transect distance s ling is widely used to estimate population densities using distances of observed targets from transect lines to model detectability. When the target taxa are high density, the frequent measuring of distances may make the method seem impractical. We present a method that improves the efficiency of distance s ling when the target species occurs at high density. Only a proportion of targets are measured to model the detection function, and the time saved on the survey is then used to cover a longer total length of transect and accrue a larger ‘count only’ s le. This approach can improve the precision of the population density estimate when the cost of measuring the distance to a detected target is more than half the cost of walking to the next target. We find the optimal proportion of distances to measure that minimises the variance of the density estimate for a fixed survey budget. We quantify how much this optimised strategy increases the precision of the density estimate compared with conventional line‐transect distance s ling. We then use simulated distance s ling data to test our expressions, and illustrate circumstances under which the optimised approach would be beneficial using distance s ling data on high‐density plants. The simulations indicate that the optimised method delivers benefits in precision, but the magnitude of the benefit is lower than predicted from our expressions, which are based on an asymptotic approximation of the variance. We apply an adjustment to the predicted benefit equation to account for this difference, and show that, in all three plant case studies, the optimised approach could improve the precision gained from a distance s ling survey between 20% and 50%. This new approach could broaden the ecological contexts in which distance s ling is applied, to include estimation of densities of abundant taxa where plots are conventionally used. The method may have interesting applications for other survey types, including multispecies surveys or those using cues or signs that occur at high density.

Publication

Threatened species impact assessments: survey effort requirements based on criteria for cumulative impacts

Publisher: Wiley

Date: 22-02-2015

DOI: 10.1111/DDI.12311

Publication

Maxent is not a presence–absence method: a comment on Thibaud et al.

Publisher: Wiley

Date: 17-09-2014

DOI: 10.1111/2041-210X.12252

Publication

Imperfect detection impacts the performance of species distribution models

Publisher: Wiley

Date: 29-11-2013

DOI: 10.1111/GEB.12138

Abstract: Species often remain undetected at sites where they are present. However, the impact of imperfect detection on species distribution models ( SDM s) is not fully appreciated. In this paper we evaluate the influence of imperfect detection on the calibration and discrimination capacity of SDM s. We compare the performance of three types of SDM s: (1) a technique based on presence–absence data, (2) a technique based on presence–background data, and (3) a technique based on detection/non‐detection data that accounts for imperfect detection. We use simulations to evaluate the impacts of imperfect detection in SDM s. This allows us to assess model performance with respect to the true objective of the models: the estimation of species distributions. We study a range of scenarios of occupancy and detection based on ecologically plausible environmental relationships and identify the circumstances in which imperfect detection affects model calibration and discrimination. We show that imperfect detection can substantially reduce the inferential and predictive accuracy of presence–absence and presence–background methods that do not account for detectability. While calibration is always affected, the influence on discrimination depends on the relationship of detectability and environmental variables. The performance of a model should be assessed with respect to its objectives. Comparative studies that intend to assess the performance of an SDM by evaluating its ability to predict detections rather than presences fail to reveal the benefits of accounting for detectability. Disregarding imperfect detection can have severe consequences for SDM performance, and hence for the estimation of species distributions. To date, this issue has been largely ignored in the SDM literature. Simultaneously modelling occupancy and detection does not necessarily require a greater s ling effort, but rather that data are collected so that they are informative about detectability. We recommend that consideration of imperfect detection become standard practice for species distribution modelling.

Publication

Can dynamic occupancy models improve predictions of species' range dynamics? A test using Swiss birds

Publisher: Wiley

Date: 19-06-2021

DOI: 10.1111/GCB.15723

Abstract: Predictions of species' current and future ranges are needed to effectively manage species under environmental change. Species ranges are typically estimated using correlative species distribution models (SDMs), which have been criticized for their static nature. In contrast, dynamic occupancy models (DOMs) explicitily describe temporal changes in species’ occupancy via colonization and local extinction probabilities, estimated from time series of occurrence data. Yet, tests of whether these models improve predictive accuracy under current or future conditions are rare. Using a long‐term data set on 69 Swiss birds, we tested whether DOMs improve the predictions of distribution changes over time compared to SDMs. We evaluated the accuracy of spatial predictions and their ability to detect population trends. We also explored how predictions differed when we accounted for imperfect detection and parameterized models using calibration data sets of different time series lengths. All model types had high spatial predictive performance when assessed across all sites (mean AUC 0.8), with flexible machine learning SDM algorithms outperforming parametric static and DOMs. However, none of the models performed well at identifying sites where range changes are likely to occur. In terms of estimating population trends, DOMs performed best, particularly for species with strong population changes and when fit with sufficient data, while static SDMs performed very poorly. Overall, our study highlights the importance of considering what aspects of performance matter most when selecting a modelling method for a particular application and the need for further research to improve model utility. While DOMs show promise for capturing range dynamics and inferring population trends when fitted with sufficient data, computational constraints on variable selection and model fitting can lead to reduced spatial accuracy of predictions, an area warranting more attention.

Publication

Defining and evaluating predictions of joint species distribution models

Publisher: Wiley

Date: 08-11-2021

DOI: 10.1111/2041-210X.13518

Publication

Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code

Publisher: Wiley

Date: 16-11-2021

DOI: 10.1002/ECM.1486

Abstract: Species distribution modeling (SDM) is widely used in ecology and conservation. Currently, the most available data for SDM are species presence‐only records (available through digital databases). There have been many studies comparing the performance of alternative algorithms for modeling presence‐only data. Among these, a 2006 paper from Elith and colleagues has been particularly influential in the field, partly because they used several novel methods (at the time) on a global data set that included independent presence–absence records for model evaluation. Since its publication, some of the algorithms have been further developed and new ones have emerged. In this paper, we explore patterns in predictive performance across methods, by reanalyzing the same data set (225 species from six different regions) using updated modeling knowledge and practices. We apply well‐established methods such as generalized additive models and MaxEnt, alongside others that have received attention more recently, including regularized regressions, point‐process weighted regressions, random forests, XGBoost, support vector machines, and the ensemble modeling framework biomod. All the methods we use include background s les (a s le of environments in the landscape) for model fitting. We explore impacts of using weights on the presence and background points in model fitting. We introduce new ways of evaluating models fitted to these data, using the area under the precision‐recall gain curve, and focusing on the rank of results. We find that the way models are fitted matters. The top method was an ensemble of tuned in idual models. In contrast, ensembles built using the biomod framework with default parameters performed no better than single moderate performing models. Similarly, the second top performing method was a random forest parameterized to deal with many background s les (contrasted to relatively few presence records), which substantially outperformed other random forest implementations. We find that, in general, nonparametric techniques with the capability of controlling for model complexity outperformed traditional regression methods, with MaxEnt and boosted regression trees still among the top performing models. All the data and code with working ex les are provided to make this study fully reproducible.

Publication

Modelling species presence‐only data with random forests

Publisher: Wiley

Date: 27-10-2021

DOI: 10.1111/ECOG.05615

Abstract: The random forest (RF) algorithm is an ensemble of classification or regression trees and is widely used, including for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence‐only data together with ‘background' s les. However, there is good evidence that RF with default parameters does not perform well for such ‘presence‐background' modelling. This is often attributed to the disparity between the number of presence and background s les, also known as 'class imbalance', and several solutions have been proposed. Here, we first set the context: the background s le should be large enough to represent all environments in the region. We then aim to understand the drivers of poor performance of RF when models are fitted to presence‐only species data alongside background s les. We show that 'class overlap' (where both classes occur in the same environment) is an important driver of poor performance, alongside class imbalance. Class overlap can even degrade performance for presence–absence data. We explain, test and evaluate suggested solutions. Using simulated and real presence‐background data, we compare performance of default RF with other weighting and s ling approaches. Our results demonstrate clear evidence of improvement in the performance of RFs when techniques that explicitly manage imbalance are used. We show that these either limit or enforce tree depth. Without compromising the environmental representativeness of the s led background, we identify approaches to fitting RF that ameliorate the effects of imbalance and overlap and allow excellent predictive performance. Understanding the problems of RF in presence‐background modelling allows new insights into how best to fit models, and should guide future efforts to best deal with such data.

Publication

Modelling species presence-only data with random forests

Publisher: Cold Spring Harbor Laboratory

Date: 17-11-2020

DOI: 10.1101/2020.11.16.384164

Abstract: The Random Forest (RF) algorithm is an ensemble of classification or regression trees, and is a widely used and high-performing machine learning technique. It is increasingly used for species distribution modelling (SDM). Many researchers use implementations of RF in the R programming language with default parameters to analyse species presence-only data together with background s les. However, there is good evidence that RF with default parameters does not perform well with such species “presence-background” data. This is often attributed to the typical disparity between the number of presence and background s les also known as class imbalance , and several solutions have been proposed. Here, we first set the context: the background s le should be large enough to represent all environments in the region. We then aim to understand the drivers of poor performance of RF with presence-background data, and explain, test and evaluate suggested solutions. Using simulated and real species data, we compare performance of default RF with other weighting and s ling approaches. We show that class overlap is an important driver of poor performance, alongside class imbalance. The results demonstrate clear evidence of improvement in the performance of RFs when class imbalance is explicitly managed by s ling methods or when the overfitting commonly associated with overlapping classes is avoided by forcing shallow trees. Presence-background data is a particular version of class imbalance in which class overlap is highly likely and extreme imbalance exists. Without compromising the environmental representativeness of the s led background, we show several approaches to fitting RF that ameliorate the effects of imbalance and overlap, and allow excellent predictive performance. Understanding the problems of RF in presence-background data allows new insights into how best to fit models, and should guide future efforts to best deal with such data.

Publication

Using occupancy as a state variable for monitoring the Critically Endangered Alaotran gentle lemur Hapalemur alaotrensis

Publisher: Inter-Research Science Center

Date: 16-04-2010

DOI: 10.3354/ESR00274

Publication

Conservation in the maelstrom of Covid‐19 – a call to action to solve the challenges, exploit opportunities and prepare for the next pandemic

Publisher: Wiley

Date: 27-05-2020

DOI: 10.1111/ACV.12601

Publication

Assessing the accuracy of density‐independent demographic models for predicting species ranges

Publisher: Wiley

Date: 02-12-2020

DOI: 10.1111/ECOG.05250

Abstract: Accurately predicting species ranges is a primary goal of ecology. Demographic distribution models (DDMs), which correlate underlying vital rates (e.g. survival and reproduction) with environmental conditions, can potentially predict species ranges through time and space. However, tests of DDM accuracy across wide ranges of species' life histories are surprisingly lacking. Using simulations of 1.5 million hypothetical species' range dynamics, we evaluated when DDMs accurately predicted future ranges, to provide clear guidelines for the use of this emerging approach. We limited our study to deterministic demographic models ignoring density dependence, since these models are the most commonly used in the literature. We found that density‐independent DDMs overpredicted extinction if populations were near carrying capacity in the locations where demographic data were available. However, DDMs accurately predicted species ranges if demographic data were limited to sites with mean initial abundance less than one half of carrying capacity. Additionally, the DDMs required demographic data from at least 25 sites, over a short time‐interval ( 10 time‐steps), as populations initially below carrying capacity can saturate in long‐term studies. For species with demographic data from many low density sites, DDMs predicted occurrence more accurately than correlative species distribution models (SDMs) in locations where the species eventually persisted, but not where the species went extinct. These results were insensitive to differences in simulated dispersal, levels of environmental stochasticity, the effects of the environmental variables and the functional forms of density dependence. Our findings suggest that deterministic, density‐independent DDMs are appropriate for applications where locating all possible sites the species might occur in is prioritized over reducing false presence predictions in absent sites. This makes DDMs a promising tool for mapping invasion risk. However, demographic data are often collected at sites where a species is abundant. Density‐independent DDMs are inappropriate in this case.

Publication

Optimal surveillance strategy for invasive species management when surveys stop after detection

Publisher: Wiley

Date: 11-04-2014

DOI: 10.1002/ECE3.1056

Publication

The predictive performance of process‐explicit range change models remains largely untested

Publisher: Wiley

Date: 18-08-2023

DOI: 10.1111/ECOG.06048

Abstract: Ecological models used to forecast range change (range change models RCM) have recently ersified to account for a greater number of ecological and observational processes in pursuit of more accurate and realistic predictions. Theory suggests that process‐explicit RCMs should generate more robust forecasts, particularly under novel environmental conditions. RCMs accounting for processes are generally more complex and data hungry, and so, require extra effort to build. Thus, it is necessary to understand when the effort of building a more realistic model is likely to generate more reliable forecasts. Here, we review the literature to explore whether process‐explicit models have been tested through benchmarking their temporal predictive performance (i.e. their predictive performance when transferred in time) and model transferability (i.e. their ability to keep their predictive performance when transferred to generate predictions into a different time) against simpler models, and highlight the gaps between the rapid development of process‐explicit RCMs and the testing of their potential improvements. We found that, out of five ecological processes (dispersal, demography, physiology, evolution, species interactions) and two observational processes (s ling bias, imperfect detection) that may influence reliability of forecasts, only the effects of dispersal, demography and imperfect detection have been benchmarked using temporally‐independent datasets. Only nine out of twenty‐nine process‐explicit model types have been tested to assess whether accounting for processes improves temporal predictive performance. We found no benchmarks assessing model transferability. We discuss potential reasons for the lack of empirical validation of process‐explicit models. Considering these findings, we propose an expanded research agenda to properly test the performance of process‐explicit RCMs, and highlight some opportunities to fill the gaps by suggesting models to be benchmarked using existing historical datasets.

Publication

A standard protocol for reporting species distribution models

Publisher: Wiley

Date: 06-2020

DOI: 10.1111/ECOG.04960

Publication

“My Uni Experience Wasn’t Completely Ruined”: The Impacts of COVID-19 on the First-Year Experience

Publisher: Queensland University of Technology

Date: 16-08-2021

DOI: 10.5204/SSJ.1762

Abstract: The first year at university is always challenging, but particularly in 2020 when COVID-19 triggered lockdowns and a rapid shift to online learning. This mixed methods study tracked the wellbeing and engagement of 60 new students in an undergraduate teacher education program at an Australian university throughout the first trimester of 2020. Follow-up focus groups with 14 students used interview and photo elicitation to explore how COVID-19 influenced wellbeing and engagement. Quantitative results demonstrate both student wellbeing and student engagement dipped strongly at the start of lockdown but recovered towards the end of the trimester. Focus group findings illustrate the ersity of experience in terms of student access to time and space to study, their ability to sustain relationships online, and the cumulative stress of COVID-19. The findings lead to recommendations for supporting this cohort and for future research.

Publication

Accounting for detectability when surveying for rare or declining reptiles: Turning rocks to find the Grassland Earless Dragon in Australia

Publisher: Elsevier BV

Date: 02-2015

DOI: 10.1016/J.BIOCON.2014.11.028

Publication

Is my species distribution model fit for purpose? Matching data and models to applications

Publisher: Wiley

Date: 08-01-2015

DOI: 10.1111/GEB.12268

Publication

Cryptic mammals caught on camera: Assessing the utility of range wide camera trap data for conserving the endangered Asian tapir

Publisher: Elsevier BV

Date: 06-2013

DOI: 10.1016/J.BIOCON.2013.03.028

Publication

blockCV: An r package for generating spatially or environmentally separated folds for k‐fold cross‐validation of species distribution models

Publisher: Wiley

Date: 08-11-2018

DOI: 10.1111/2041-210X.13107

Publication

Flexible species distribution modelling methods perform well on spatially separated testing data

Publisher: Wiley

Date: 27-01-2023

DOI: 10.1111/GEB.13639

Abstract: To assess whether flexible species distribution models that perform well at nearby testing locations still perform strongly when evaluated on spatially separated testing data. Australian Wet Tropics (AWT), Ontario, Canada (CAN), north‐east New South Wales, Australia (NSW), New Zealand (NZ), five countries of South America (SA), and Switzerland (SWI). Most species data were collected between 1950 and 2000. Birds, mammals, plants and reptiles. We compared 10 species distribution modelling methods with varying flexibility in terms of the allowed complexity of their fitted functions [boosted regression trees (BRT), generalized additive model (GAM), multivariate adaptive regression splines (MARS), maximum entropy (MaxEnt), support vector machine (SVM), variants of generalized linear model (GLM) and random forest (RF), and an Ensemble model]. We used established practices for model selection to avoid overfitting, including parameter tuning in learning methods. Models were trained on presence–background data for 171 species and tested on presence–absence data. Training and testing data were separated using both random and spatial partitioning, the latter based on 75‐km blocks. We calculated the average performance and mean rank of the methods (focussing on the area under the receiver operating characteristic and precision‐recall gain curves, and correlation) and assessed the statistical significance of the differences between them. The ranking of methods did not change when evaluated on spatially separated testing data. Methods with the strongest predictive performance were nonparametric methods known to be flexible. An ensemble formed by averaging predictions of five pre‐selected modelling methods was the best model in both random and spatial partitioning, followed by MaxEnt and a variant of random forest. Whilst some modellers expect methods limited to simple smooth functions to predict better spatially separated data, we found no evidence of that using blocks of 75 km. We conclude that flexible models that are tuned well enough to avoid overfitting are effective at predicting to spatially distinct areas.

Publication

A comparison of joint species distribution models for presence–absence data

Publisher: Wiley

Date: 03-11-2019

DOI: 10.1111/2041-210X.13106

Publication

Satellite imagery as a single source of predictor variables for habitat suitability modelling: how Landsat can inform the conservation of a critically endangered lemur

Publisher: Wiley

Date: 16-08-2010

DOI: 10.1111/J.1365-2664.2010.01854.X

Publication

Traits explain invasion of alien plants into tropical rainforests

Publisher: Wiley

Date: 25-03-2021

DOI: 10.1002/ECE3.7206

Publication

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Publisher: Wiley

Date: 27-01-2020

DOI: 10.1111/ECOG.04890

Related Organisations

Organisation

Massey University

Location: New Zealand

View Organisation

Related Funding Activities

Grant

Discovery Early Career Researcher Award - Grant ID: DE160100904

Start Date: 02-2016

End Date: 12-2019

Amount: $360,000.00

Funder: Australian Research Council

View Funded Activity

Grant

Discovery Projects - Grant ID: DP180101852

Start Date: 01-2018

End Date: 12-2021

Amount: $396,250.00

Funder: Australian Research Council

View Funded Activity

Grant

Linkage Projects - Grant ID: LP170100305

Start Date: 2019

End Date: 12-2024

Amount: $490,233.00

Funder: Australian Research Council

View Funded Activity

Grant

Discovery Projects - Grant ID: DP230101907

Start Date: 05-2023

End Date: 05-2026

Amount: $654,671.00

Funder: Australian Research Council

View Funded Activity

Gurutzeta Guillera-Arroita

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models

Ignoring Imperfect Detection in Biological Surveys Is Dangerous: A Response to ‘Fitting and Interpreting Occupancy Models'

A review of evidence about use and performance of species distribution modelling ensembles like BIOMOD

Efficient effort allocation in line‐transect distance sampling of high‐density species: When to walk further, measure less‐often and gain precision

Threatened species impact assessments: survey effort requirements based on criteria for cumulative impacts

Maxent is not a presence–absence method: a comment on Thibaud et al.

Imperfect detection impacts the performance of species distribution models

Can dynamic occupancy models improve predictions of species' range dynamics? A test using Swiss birds

Defining and evaluating predictions of joint species distribution models

Predictive performance of presence‐only species distribution models: a benchmark study with reproducible code

Modelling species presence‐only data with random forests

Modelling species presence-only data with random forests

Using occupancy as a state variable for monitoring the Critically Endangered Alaotran gentle lemur Hapalemur alaotrensis

Conservation in the maelstrom of Covid‐19 – a call to action to solve the challenges, exploit opportunities and prepare for the next pandemic

Assessing the accuracy of density‐independent demographic models for predicting species ranges

Optimal surveillance strategy for invasive species management when surveys stop after detection

The predictive performance of process‐explicit range change models remains largely untested

A standard protocol for reporting species distribution models

“My Uni Experience Wasn’t Completely Ruined”: The Impacts of COVID-19 on the First-Year Experience

Accounting for detectability when surveying for rare or declining reptiles: Turning rocks to find the Grassland Earless Dragon in Australia

Is my species distribution model fit for purpose? Matching data and models to applications

Cryptic mammals caught on camera: Assessing the utility of range wide camera trap data for conserving the endangered Asian tapir

blockCV: An r package for generating spatially or environmentally separated folds for k‐fold cross‐validation of species distribution models

Flexible species distribution modelling methods perform well on spatially separated testing data

A comparison of joint species distribution models for presence–absence data

Satellite imagery as a single source of predictor variables for habitat suitability modelling: how Landsat can inform the conservation of a critically endangered lemur

Traits explain invasion of alien plants into tropical rainforests

Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models

Related Organisations

Massey University

Related Funding Activities

Discovery Early Career Researcher Award - Grant ID: DE160100904

Discovery Projects - Grant ID: DP180101852

Linkage Projects - Grant ID: LP170100305

Discovery Projects - Grant ID: DP230101907

ARDC NEWSLETTER SIGNUP