ORCID Profile
0000-0003-0765-3533
Current Organisation
Australian National University
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Applied Statistics | Statistical Theory | Statistics | Statistical theory | Applied statistics | Statistics
Expanding Knowledge in the Mathematical Sciences | Effects of Climate Change and Variability on Antarctic and Sub-Antarctic Environments (excl. Social Impacts) | Specific Population Health (excl. Indigenous Health) not elsewhere classified | Expanding Knowledge in the Chemical Sciences | Mental Health |
Publisher: Cold Spring Harbor Laboratory
Date: 29-03-2021
DOI: 10.1101/2021.03.28.437086
Abstract: Visualising data is a vital part of analysis, allowing researchers to find patterns, and assess and communicate the results of statistical modeling. In ecology, visualisation is often challenging when there are many variables (often for different species or other taxonomic groups) and they are not normally distributed (often counts or presence-absence data). Ordination is a common and powerful way to overcome this hurdle by reducing data from many response variables to just two or three, to be easily plotted. Ordination is traditionally done using dissimilarity-based methods, most commonly non-metric multidimensional scaling (nMDS). In the last decade however, model-based methods for unconstrained ordination have gained popularity. These are primarily based on latent variable models, with latent variables estimating the underlying, unobserved ecological gradients. Despite some major benefits, a major drawback of model-based ordination methods is their speed, as they typically taking much longer to return a result than dissimilarity-based methods, especially for large s le sizes. We introduce copula ordination, a new, scalable model-based approach to unconstrained ordination. This method has all the desirable properties of model-based ordination methods, with the added advantage that it is computationally far more efficient. In particular, simulations show copula ordination is an order of magnitude faster than current model-based methods, and can even be faster than nMDS for large s le sizes, while being able to produce similar ordination plots and trends as these methods.
Publisher: Oxford University Press (OUP)
Date: 12-06-2015
Publisher: Wiley
Date: 18-08-2014
Publisher: Wiley
Date: 24-07-2019
Abstract: Ecologists often investigate co‐occurrence patterns in multi‐species data in order to gain insight into the ecological causes of observed co‐occurrences. Apart from direct associations between the two species of interest, they may co‐occur because of indirect effects, where both species respond to another variable, whether environmental or biotic (e.g. a mediator species). A wide variety of methods are now available for modelling how environmental filtering drives species distributions. In contrast, methods for studying other causes of co‐occurence are much more limited. “Graphical” methods, which can be used to study how mediator species impact co‐occurrence patterns, have recently been proposed for use in ecology. However, available methods are limited to presence/absence data or methods assuming multivariate normality, which is problematic when analysing abundances. We propose Gaussian copula graphical models (GCGMs) for studying the effect of mediator species on co‐occurence patterns. GCGMs are a flexible type of graphical model which naturally accommodates all data types , for ex le binary (presence/absence), counts, as well as ordinal data and biomass, in a unified framework. Simulations demonstrate that GCGMs can be applied to a much broader range of data types than the methods currently used in ecology, and perform as well as or better than existing methods in many settings. We apply GCGMs to counts of hunting spiders, in order to visualise associations between species. We also analyse abundance data of New Zealand native forest cover (on an ordinal scale) to show how GCGMs can be used analyse large and complex datasets. In these data, we were able to reproduce known species relationships as well as generate new ecological hypotheses about species associations.
Publisher: Cold Spring Harbor Laboratory
Date: 19-11-2018
DOI: 10.1101/470161
Abstract: Ecologists often investigate co-occurrence patterns in multi-species data in order to gain insight into the ecological causes of observed co-occurrences. Apart from direct associations between two species, two species may co-occur because they both respond in similar ways to environmental variables, or due to the presence of other (mediator) species. A wide variety of methods are now available for modelling how environmental filtering drives species distributions. In contrast, methods for studying other causes of co-occurence are much more limited. “Graphical” methods, which can be used to study how mediator species impact co-occurrence patterns, have recently been proposed for use in ecology. However, available methods are limited to presence/absence data and methods assuming multivariate normality, which is problematic when analysing abundances. We propose Gaussian copula graphical models (GCGMs) for studying the effect of mediator species on co-occurence patterns. GCGMs are a flexible type of graphical model which naturally accommodates all data types – binary (presence/absence), counts, as well as ordinal data and biomass, in a unified framework. Simulations for count data demonstrate that GCGMs are better able to distinguish effects of mediator species from direct associations than using existing methods designed for multivariate normal data. We apply GCGMs to counts of hunting spiders, in order to visualise associations between species. We then analyze abundance data of New Zealand native forest cover (on an ordinal scale) to show how GCGMs can be used analyze large and complex datasets. In these data, we were able to reproduce known species relationships as well as generate new ecological hypotheses about species associations.
Publisher: Informa UK Limited
Date: 17-11-2021
Publisher: Oxford University Press (OUP)
Date: 17-10-2014
DOI: 10.1111/BIJ.12402
Publisher: Wiley
Date: 17-07-2018
Abstract: Delineating naturally occurring and self-sustaining subpopulations (stocks) of a species is an important task, especially for species harvested from the wild. Despite its central importance to natural resource management, analytical methods used to delineate stocks are often, and increasingly, borrowed from superficially similar analytical tasks in human genetics even though models specifically for stock identification have been previously developed. Unfortunately, the analytical tasks in resource management and human genetics are not identical-questions about humans are typically aimed at inferring ancestry (often referred to as "admixture") rather than breeding stocks. In this article, we argue, and show through simulation experiments and an analysis of yellowfin tuna data, that ancestral analysis methods are not always appropriate for stock delineation. In this work, we advocate a variant of a previously introduced and simpler model that identifies stocks directly. We also highlight that the computational aspects of the analysis, irrespective of the model, are difficult. We introduce some alternative computational methods and quantitatively compare these methods to each other and to established methods. We also present a method for quantifying uncertainty in model parameters and in assignment probabilities. In doing so, we demonstrate that point estimates can be misleading. One of the computational strategies presented here, based on an expectation-maximization algorithm with judiciously chosen starting values, is robust and has a modest computational cost.
Publisher: IGI Global
Date: 2009
DOI: 10.4018/978-1-60566-026-4.CH002
Abstract: Actionable knowledge discovery is selected as one of the greatest challenges (Ankerst, 2002 Fayyad, Shapiro, & Uthurusamy, 2003) of next-generation knowledge discovery in database (KDD) studies (Han & Kamber, 2006). In the existing data mining, often mined patterns are nonactionable to real user needs. To enhance knowledge actionability, domain-related social intelligence is substantially essential (Cao et al., 2006b). The involvement of domain-related social intelligence into data mining leads to domaindriven data mining (Cao & Zhang, 2006a, 2007a), which complements traditional data-centered mining methodology. Domain-related social intelligence consists of intelligence of human, domain, environment, society and cyberspace, which complements data intelligence. The extension of KDD toward domain-driven data mining involves many challenging but promising research and development issues in KDD. Studies in regard to these issues may promote the paradigm shift of KDD from data-centered interesting pattern mining to domain-driven actionable knowledge discovery, and the deployment shift from simulated data set-based to real-life data and business environment-oriented as widely predicted.
Publisher: Springer Science and Business Media LLC
Date: 16-05-2013
Publisher: Wiley
Date: 25-10-2017
DOI: 10.1002/ECE3.3496
Publisher: Cold Spring Harbor Laboratory
Date: 14-05-2017
DOI: 10.1101/137943
Abstract: In addition to the processes structuring free-living communities, host-associated microbiota are directly or indirectly shaped by the host. Therefore, microbiota data have a hierarchical structure where s les are nested under one or several variables representing host-specific factors, often spanning multiple levels of biological organization. Current statistical methods do not accommodate this hierarchical data structure, and therefore cannot explicitly account for the effect of the host in structuring the microbiota. We introduce a novel extension of joint species distribution models (JSDMs) which can straightforwardly accommodate and discern between effects such as host phylogeny and traits, recorded covariates like diet and collection sites, among other ecological processes. Our proposed methodology includes powerful yet familiar outputs seen in community ecology overall, including: (i) model-based ordination to visualize and quantify the main patterns in the data (ii) variance partitioning to asses how influential the included host-specific factors are in structuring the microbiota and (iii) co-occurrence networks to visualize microbe-to-microbe associations.
Publisher: Wiley
Date: 06-2021
DOI: 10.1111/ANZS.12323
Abstract: The analysis of longitudinal income data is often made challenging for several reasons. For ex le, in a national Australian survey on income over time, a non‐negligible proportion of responses are missing, and it is believed the missingness mechanism is non‐ignorable. Also, there are a large number of reported zero incomes, some of which may be true zeros (corresponding to in iduals who legitimately do not earn an income), while some may be false zeros (corresponding to in iduals choosing to round their income to zero). We propose a new shared parameter mixture (SPM) model for analysing semi‐continuous longitudinal income data, which addresses the two challenges of income non‐response and zero rounding. This is accomplished by jointly modelling an in idual's underlying income together with the probability of missingness and rounding to zero, where both probabilities are permitted to vary in a smooth manner with their underlying non‐zero income. Applying the SPM model to the Australian income survey reveals that on average, older female in iduals and in iduals with a long‐term health condition are considerably less likely to earn an income, while income tended to be highest for male in iduals on fixed‐term ermanent job contracts between ages 50 and 60. Furthermore there is evidence of both zero rounding, and conditional on the assumed missingness mechanism, in iduals with incomes at the higher and lower ends are more likely to not report their income.
Publisher: Wiley
Date: 15-04-2020
DOI: 10.1111/INSR.12378
Abstract: There has been considerable and controversial research over the past two decades into how successfully random effects misspecification in mixed models (i.e. assuming normality for the random effects when the true distribution is non‐normal) can be diagnosed and what its impacts are on estimation and inference. However, much of this research has focused on fixed effects inference in generalised linear mixed models. In this article, motivated by the increasing number of applications of mixed models where interest is on the variance components, we study the effects of random effects misspecification on random effects inference in linear mixed models, for which there is considerably less literature. Our findings are surprising and contrary to general belief: for point estimation, maximum likelihood estimation of the variance components under misspecification is consistent, although in finite s les, both the bias and mean squared error can be substantial. For inference, we show through theory and simulation that under misspecification, standard likelihood ratio tests of truly non‐zero variance components can suffer from severely inflated type I errors, and confidence intervals for the variance components can exhibit considerable under coverage. Furthermore, neither of these problems vanish asymptotically with increasing the number of clusters or cluster size. These results have major implications for random effects inference, especially if the true random effects distribution is heavier tailed than the normal. Fortunately, simple graphical and goodness‐of‐fit measures of the random effects predictions appear to have reasonable power at detecting misspecification. We apply linear mixed models to a survey of more than 4 000 high school students within 100 schools and analyse how mathematics achievement scores vary with student attributes and across different schools. The application demonstrates the sensitivity of mixed model inference to the true but unknown random effects distribution.
Publisher: Wiley
Date: 30-05-2019
DOI: 10.1002/ECY.2754
Abstract: Spatiotemporal patterns in biological communities are typically driven by environmental factors and species interactions. Spatial data from communities are naturally described by stacking models for all species in the community. Two important considerations in such multispecies or joint species distribution models (JSDMs) are measurement errors and correlations between species. Up to now, virtually all JSDMs have included either one or the other, but not both features simultaneously, even though both measurement errors and species correlations may be essential for achieving unbiased inferences about the distribution of communities and species co-occurrence patterns. We developed two presence-absence JSDMs for modeling pairwise species correlations while accommodating imperfect detection: one using a latent variable and the other using a multivariate probit approach. We conducted three simulation studies to assess the performance of our new models and to compare them to earlier latent variable JSDMs that did not consider imperfect detection. We illustrate our models with a large Atlas data set of 62 passerine bird species in Switzerland. Under a wide range of conditions, our new latent variable JSDM with imperfect detection and species correlations yielded estimates with little or no bias for occupancy, occupancy regression coefficients, and the species correlation matrix. In contrast, with the multivariate probit model we saw convergence issues with large data sets (many species and sites) resulting in very long run times and larger errors. A latent variable model that ignores imperfect detection produced correlation estimates that were consistently negatively biased, that is, underestimated. We found that the number of latent variables required to represent the species correlation matrix adequately may be much greater than previously suggested, namely around n/2, where n is community size. The analysis of the Swiss passerine data set exemplifies how not accounting for imperfect detection will lead to negative bias in occupancy estimates and to attenuation in the estimated covariate coefficients in a JSDM. Furthermore, spatial heterogeneity in detection may cause spurious patterns in the estimated species correlation matrix if not accounted for. Our new JSDMs represent an important extension of current approaches to community modeling to the common case where species presence-absence cannot be detected with certainty.
Publisher: Elsevier BV
Date: 09-2020
Publisher: Inter-Research Science Center
Date: 09-01-2020
DOI: 10.3354/MEPS13177
Publisher: Wiley
Date: 06-02-2022
DOI: 10.1111/SJOS.12569
Abstract: We consider a new approach for estimating non‐Gaussian undirected graphical models. Specifically, we model continuous data from a class of multivariate skewed distributions, whose conditional dependence structure depends on both a precision matrix and a shape vector. To estimate the graph, we propose a novel estimation method based on nodewise regression: we first fit a linear model, and then fit a one component projection pursuit regression model to the residuals obtained from the linear model, and finally threshold appropriate quantities. Theoretically, we establish error bounds for each nodewise regression and prove the consistency of the estimated graph when the number of variables erges with the s le size. Simulation results demonstrate the strong finite s le performance of our new method over existing methods for estimating Gaussian and non‐Gaussian graphical models. Finally, we demonstrate an application of the proposed method on observations of physicochemical properties of wine.
Publisher: Cambridge University Press (CUP)
Date: 09-2022
DOI: 10.1017/ASB.2022.18
Abstract: Customer churn, which insurance companies use to describe the non-renewal of existing customers, is a widespread and expensive problem in general insurance, particularly because contracts are usually short-term and are renewed periodically. Traditionally, customer churn analyses have employed models which utilise only a binary outcome (churn or not churn) in one period. However, real business relationships are multi-period, and policyholders may reside and transition between a wider range of states beyond that of the simply churn/not churn throughout this relationship. To better encapsulate the richness of policyholder behaviours through time, we propose multi-state customer churn analysis, which aims to model behaviour over a larger number of states (defined by different combinations of insurance coverage taken) and across multiple periods (thereby making use of readily available longitudinal data). Using multinomial logistic regression (MLR) with a second-order Markov assumption, we demonstrate how multi-state customer churn analysis offers deeper insights into how a policyholder’s transition history is associated with their decision making, whether that be to retain the current set of policies, churn, or add/drop a coverage. Applying this model to commercial insurance data from the Wisconsin Local Government Property Insurance Fund, we illustrate how transition probabilities between states are affected by differing sets of explanatory variables and that a multi-state analysis can potentially offer stronger predictive performance and more accurate calculations of customer lifetime value (say), compared to the traditional customer churn analysis techniques.
Publisher: Institute of Mathematical Statistics
Date: 06-2015
DOI: 10.1214/15-AOAS813
Publisher: MDPI AG
Date: 05-06-2023
DOI: 10.3390/F14061166
Abstract: A fundamental requirement of sustainable forest management is that stands are adequately regenerated after harvesting. To date, most research has focused on the regeneration of the dominant timber species and to a lesser degree on plant communities. Few studies have explored the impact of the regeneration success of dominant tree species on plant community composition and ersity. In this study, we quantified the influence of variability in tree density and climatic and edaphic factors on plant species ersity in montane regrowth forests dominated by Eucalyptus regnans in the Central Highlands of Victoria in southeastern Australia. We found that Acacia density shaped plant bio ersity more than Eucalyptus density. Edaphic factors, particularly soil nutrition and moisture availability, played a significant role in shaping species turnover and occurrence. Our findings suggest that the density of Acacia is a key biotic filter that influences the occurrence of many understorey plant species and shapes plant community turnover. This should be considered when assessing the impacts of both natural and anthropogenic disturbances on plant bio ersity in the montane forests of southeastern Australia.
Publisher: Wiley
Date: 04-05-2021
Abstract: It is common practice for ecologists to examine species niches in the study of community composition. The response curve of a species in the fundamental niche is usually assumed to be quadratic. The centre of a quadratic curve represents a species' optimal environmental conditions, and the width its ability to tolerate deviations from the optimum. Most multivariate methods assume species respond linearly to niche axes, or with a quadratic curve that is of equal width for all species. However, it is widely understood that some species have the ability to better tolerate deviations from their optimal environment (generalists) compared to other (specialist) species. Rare species often tolerate a smaller range of environments than more common species, corresponding to a narrow niche. We propose a new method, for ordination and fitting Joint Species Distribution Models, based on Generalized Linear Mixed‐effects Models, which relaxes the assumptions of equal tolerances. By explicitly estimating species maxima, and species optima and tolerances per ecological gradient, we can better explore how species relate to each other.
Publisher: Cambridge University Press (CUP)
Date: 12-11-2015
Publisher: MDPI AG
Date: 21-04-2022
DOI: 10.3390/D14050320
Abstract: Negative binomial modelling is one of the most commonly used statistical tools for analysing count data in ecology and bio ersity research. This is not surprising given the prevalence of overdispersion (i.e., evidence that the variance is greater than the mean) in many biological and ecological studies. Indeed, overdispersion is often indicative of some form of biological aggregation process (e.g., when species or communities cluster in groups). If overdispersion is ignored, the precision of model parameters can be severely overestimated and can result in misleading statistical inference. In this article, we offer some insight as to why the negative binomial distribution is becoming, and arguably should become, the default starting distribution (as opposed to assuming Poisson counts) for analysing count data in ecology and bio ersity research. We begin with an overview of traditional uses of negative binomial modelling, before examining several modern applications and opportunities in modern ecology/bio ersity where negative binomial modelling is playing a critical role, from generalisations based on exploiting its Poisson-gamma mixture formulation in species distribution models and occurrence data analysis, to estimating animal abundance in negative binomial N-mixture models, and bio ersity measures via rank abundance distributions. Comparisons to other common models for handling overdispersion on real data are provided. We also address the important issue of software, and conclude with a discussion of future directions for analysing ecological and biological data with negative binomial models. In summary, we hope this overview will stimulate the use of negative binomial modelling as a starting point for the analysis of count data in ecology and bio ersity studies.
Publisher: Wiley
Date: 15-06-2015
Publisher: Wiley
Date: 03-2021
DOI: 10.1111/ANZS.12337
Abstract: Point process models are a natural approach for modelling data that arise as point events. In the case of Poisson counts, these may be fitted easily as a weighted Poisson regression. Point processes lack the notion of s le size. This is problematic for model selection, because various classical criteria such as the Bayesian information criterion (BIC) are a function of the s le size, n , and are derived in an asymptotic framework where n tends to infinity. In this paper, we develop an asymptotic result for Poisson point process models in which the observed number of point events, m , plays the role that s le size does in the classical regression context. Following from this result, we derive a version of BIC for point process models, and when fitted via penalised likelihood, conditions for the LASSO penalty that ensure consistency in estimation and the oracle property. We discuss challenges extending these results to the wider class of Gibbs models, of which the Poisson point process model is a special case.
Publisher: Oxford University Press (OUP)
Date: 30-07-2019
Abstract: Identifying the role that environmental factors and biotic interactions play in species distribution can be essential to better understand and predict how ecosystems will respond to changing environmental conditions. This study aimed at disentangling the assemblage of the pelagic predator–prey community by identifying interspecific associations and their main drivers. For this purpose, we applied the joint species distribution modelling approach, JSDM, to the co-occurrence patterns of both prey and top predator communities obtained from JUVENA surveys during 2013–2016 in the Bay of Biscay. Results showed that the co-occurrence patterns of top predators and prey were driven by a combination of environmental and biotic factors, which highlighted the importance of considering both components to fully understand the community structure. In addition, results also revealed that many biotic interactions, such as schooling in prey (e.g. anchovy–sardine), local enhancement/facilitation in predators (e.g. Cory’s shearwater–fin whale), and predation between predator–prey species (e.g. northern gannet–horse mackerel), were led by positive associations, although predator avoidance behaviour was also suggested between negatively associated species (e.g. striped dolphin–blue whiting). The identification of interspecific associations can therefore provide insights on the functioning of predators–prey network and help advance towards an ecosystem-based management.
Publisher: Cold Spring Harbor Laboratory
Date: 07-10-2020
DOI: 10.1101/2020.10.05.326199
Abstract: It is common practice for ecologists to examine species niches in the study of community composition. The response curve of a species in the fundamental niche is usually assumed to be quadratic. The center of a quadratic curve represents a species’ optimal environmental conditions, and the width its ability to tolerate deviations from the optimum. Most multivariate methods assume species respond linearly to the environment of the niche, or with a quadratic curve that is of equal width and height for all species. However, it is widely understood that some species are generalists who tolerate deviations from their optimal environment better than others. Rare species often tolerate a smaller range of environments than more common species, corresponding to a narrow niche. We propose a new method, for ordination and fitting Joint Species Distribution Models, based on Generalized Linear Mixed-Effects Models, which relaxes the assumptions of equal tolerances and equal maxima. By explicitly estimating species optima, tolerances, and maxima, per ecological gradient, we can better predict change in species communities, and understand how species relate to each other.
Publisher: Wiley
Date: 03-2019
DOI: 10.1111/ANZS.12256
Publisher: Wiley
Date: 20-12-2019
DOI: 10.1002/ECY.2920
Abstract: Social information obtained from heterospecifics can enhance in idual fitness by reducing environmental uncertainty, making it an important driver of mixed-species grouping behavior. Heterospecific groups are well documented among fishes, yet are notably more prevalent among juveniles than more advanced life stages, implying that the adaptive value of joining other species is greater during this developmental period. We propose this phenomenon can be explained by the heightened ecological relevance of heterospecifically produced cues pertaining to predation risk and or resources, as body-size uniformity inherent in early ontogeny yields greater overlap in predator and prey guild membership across juveniles of disparate taxa. To evaluate the putative role of information in shaping juvenile fish assemblages, we employed a joint species distribution model (JSDM), identifying nonrandom relationships among fishes collected in 785 seine hauls within the shallow littoral zones of a subtropical island. After accounting for species-environment relationships, which explained 39% of observed covariation in the abundance of 11 taxa, we detected high rates of positive association (84% of significant correlations) predominantly between mutual foraging guild members, consistent with assemblage patterns predicted to evolve under widespread interspecific information use. Affiliations occurred primarily between species characterized by neutral (i.e., noninteracting) or negative (i.e., predator-prey) relationships in later life stages, supporting the notion that heightened niche overlap due to body size homogeneity acted to increase the pertinence of information among juveniles. Taxa exerted varying degrees of influence on assemblage structure however Eucinostomus spp., a gregarious generalist with exceptional information-production potential, had an effect several times that of all other species combined, further evidencing the likely role of information in motivating observed relationships. Co-occurrence and qualitative behavioral data inferred from remote underwater video surveys reinforced these conclusions. Collectively, these results suggest that positive interactions linked to information exchange can be among the principal factors organizing juvenile fish assemblages at local scales, highlighting the role of ontogeny in mediating the relevance and exploitation of information across species.
Publisher: Informa UK Limited
Date: 02-01-2017
Publisher: Elsevier BV
Date: 2023
Publisher: Elsevier BV
Date: 2017
Publisher: Informa UK Limited
Date: 04-01-2022
Publisher: Wiley
Date: 22-10-2019
Publisher: Informa UK Limited
Date: 02-01-2015
Publisher: Springer Singapore
Date: 2019
Publisher: Wiley
Date: 09-2013
DOI: 10.1890/12-1322.1
Abstract: Species distribution models (SDMs) are an important tool for studying the patterns of species across environmental and geographic space. For community data, a common approach involves fitting an SDM to each species separately, although the large number of models makes interpretation difficult and fails to exploit any similarities between in idual species responses. A recently proposed alternative that can potentially overcome these difficulties is species archetype models (SAMs), a model-based approach that clusters species based on their environmental response. In this paper, we compare the predictive performance of SAMs against separate SDMs using a number of multi-species data sets. Results show that SAMs improve model accuracy and discriminatory capacity compared to separate SDMs. This is achieved by borrowing strength from common species having higher information content. Moreover, the improvement increases as the species become rarer.
Publisher: Cold Spring Harbor Laboratory
Date: 06-03-2020
DOI: 10.1101/2020.03.05.980060
Abstract: Recently, there has been an increasing interest in model-based approaches for the statistical modelling of the joint distribution of multi-species abundances. The Dirichlet-multinomial distribution has been proposed as a suitable candidate distribution for the joint species distribution of pin-point plant cover data and is here applied in a model-based ordination framework. Unlike most model-based ordination methods, both fixed and random effects are in our proposed model structured as p -dimensional vectors and added to the latent variables before the inner product with the species-specific coefficients. This changes the interpretation of the parameters, so that the fixed and random effects now measure the relative displacement of the vegetation by the fixed and random factors in the p -dimensional latent variable space. This parameterization allows statistical inference of the effect of fixed and random factors in vector space, and makes it easier for practitioners to perform inferences on species composition in a multivariate setting. The method was applied on plant pin-point cover data from dry heathlands that had received different management treatments (burned, grazed, harvested, unmanaged), and it was found that treatment have a significant effect on heathland vegetation both when considering plant functional groups or when the taxonomic resolution was at the species level.
Publisher: Wiley
Date: 08-12-2022
Abstract: In community ecology, unconstrained ordination can be used to indirectly explore drivers of community composition, while constrained ordination can be used to directly relate predictors to an ecological community. However, existing constrained ordination methods do not explicitly account for community composition that cannot be explained by the predictors, so that they have the potential to misrepresent community composition if not all predictors are available in the data. We propose and develop a set of new methods for ordination and joint species distribution modelling (JSDM) as part of the generalized linear latent variable model (GLLVM) framework, that incorporate predictors directly into an ordination. This includes a new ordination method that we refer to as concurrent ordination, as it simultaneously constructs unconstrained and constrained latent variables. Both unmeasured residual covariation and predictors are incorporated into the ordination by simultaneously imposing reduced rank structures on the residual covariance matrix and on fixed‐effects. We evaluate the method with a simulation study, and show that the proposed developments outperform canonical correspondence analysis (CCA) for Poisson and Bernoulli responses, and perform similar to redundancy analysis (RDA) for normally distributed responses, the two most popular methods for constrained ordination in community ecology. Two ex les with real data further demonstrate the benefits of concurrent ordination, and the need to account for residual covariation in the analysis of multivariate data. This article contextualizes the role of constrained ordination in the GLLVM and JSDM frameworks, while developing a new ordination method that incorporates the best of unconstrained and constrained ordination, and which overcomes some of the deficiencies of existing classical ordination methods.
Publisher: Elsevier BV
Date: 11-2012
Publisher: Wiley
Date: 25-05-2021
DOI: 10.1002/ENV.2683
Abstract: In ecological community studies it is often of interest to study the effect of species related trait variables on abundances or presence‐absences. Specifically, the interest may lay in the interactions between environmental and trait variables. An increasingly popular approach for studying such interactions is to use the so‐called fourth‐corner model, which explicitly posits a regression model where the mean response of each species is a function of interactions between covariate and trait predictors (among other terms). On the other hand, many of the fourth‐corner models currently applied in the literature are too simplistic to properly account for variation in environmental and trait response and any residual covariation between species. To overcome this problem, we propose a fourth‐corner latent variable model which combines the following three features: latent variables to capture the correlation between species, fourth‐corner terms to account for environment‐trait interactions, and species‐specific random slopes for modeling excess heterogeneity between species in their environmental response. We perform an extensive numerical study comparing a variety of fourth‐corner models available in the literature which account for the aforementioned sources of variation to varying degrees. Simulation results demonstrate that the proposed fourth‐corner latent variable models performed well when testing for the fourth‐corner (interaction) coefficients, across both Type I error and power. By comparison, some models that do not full account for all relevant sources of variation suffer from inflated Type I error leading to potentially misleading inference. The proposed method is illustrated by an ex le on ground beetle data.
Publisher: Elsevier BV
Date: 10-2016
Publisher: Elsevier BV
Date: 05-2018
Publisher: Wiley
Date: 26-07-2017
Publisher: Wiley
Date: 18-11-2021
Abstract: Visualising data is a key step in data analysis, allowing researchers to find patterns, and assess and communicate the results of statistical modelling. In ecology, visualisation is often challenging when there are many variables (often for different species or other taxonomic groups) and they are not normally distributed (often counts or presence–absence data). Ordination is a common and powerful way to overcome this hurdle by reducing data from many response variables to just two or three, to be easily plotted. Ordination is traditionally done using dissimilarity‐based methods, most commonly non‐metric multidimensional scaling (nMDS). In the last decade, however, model‐based methods for unconstrained ordination have gained popularity. These are primarily based on latent variable models, with latent variables estimating the underlying, unobserved ecological gradients. Despite some major benefits, a drawback of model‐based ordination methods is their speed, as they typically take much longer to return a result than dissimilarity‐based methods, especially for large s le sizes. We introduce copula ordination, a new, scalable model‐based approach to unconstrained ordination. This method has all the desirable properties of model‐based ordination methods, with the added advantage that it is computationally far more efficient. In particular, simulations show copula ordination is an order of magnitude faster than current model‐based methods, and can even be faster than nMDS for large s le sizes, while being able to produce similar ordination plots and trends as these methods.
Publisher: Cold Spring Harbor Laboratory
Date: 12-10-2021
DOI: 10.1101/2021.10.11.463884
Abstract: In community ecology, unconstrained ordination can be used to indirectly explore drivers of community composition, while constrained ordination can be used to directly relate predictors to an ecological community. However, existing constrained ordination methods do not explicitly account for community composition that cannot be explained by the predictors, so that they have the potential to misrepresent community composition if not all predictors are available in the data. We propose and develop a set of new methods for ordination and Joint Species Distribution Modelling (JSDM) as part of the Generalized Linear Latent Variable Model (GLLVM) framework, that incorporate predictors directly into an ordination. This includes a new ordination method that we refer to as concurrent ordination, as it simultaneously constructs unconstrained and constrained latent variables. Both unmeasured residual covariation and predictors are incorporated into the ordination by simultaneously imposing reduced rank structures on the residual covariance matrix and on fixed-effects. We evaluate the method with a simulation study, and show that the proposed developments outperform Canonical Correspondence Analysis (CCA) for Poisson and Bernoulli responses, and perform similar to Redundancy Analysis (RDA) for normally distributed responses, the two most popular methods for constrained ordination in community ecology. Two ex les with real data further demonstrate the benefits of concurrent ordination, and the need to account for residual covariation in the analysis of multivariate data. This article contextualizes the role of constrained ordination in the GLLVM and JSDM frameworks, while developing a new ordination method that incorporates the best of unconstrained and constrained ordination, and which overcomes some of the deficiencies of existing classical ordination methods.
Publisher: Springer Science and Business Media LLC
Date: 24-08-2017
Publisher: Wiley
Date: 22-01-2022
DOI: 10.1111/ANZS.12349
Abstract: Sufficient dimension reduction (SDR) is an attractive approach to regression modelling. However, despite its rich literature and growing popularity in application, surprisingly little research has been done on how to perform SDR for clustered data, for ex le as is commonly arises in longitudinal studies. Indeed, current popular SDR methods have been mostly based on a marginal estimating equation approach. In this article, we propose a new approach to SDR for clustered data based on a combination of finite mixture modelling and mixed effects regression. Finite mixture models offer a flexible means of estimating the fixed effects central subspace, based on slicing the space up and probabilistically clustering observations to each slice (mixture component). Dimension reduction is achieved by having the mixing proportions vary only through the sufficient fixed effect predictors. We then incorporate random effects as a natural means of accounting for correlations within clusters. We employ a Monte Carlo expectation–maximisation algorithm to estimate the model parameters and fixed effects central subspace, and discuss methods for associated uncertainty quantification and prediction. Simulation studies demonstrate that our approach performs strongly against both estimating equation methods for estimating the fixed effects central subspace, and SDR methods which do not account for within‐cluster correlation. Finally, we apply the proposed approach to a data set on air pollutant monitoring across 13 stations in the Eastern United States.
Publisher: Informa UK Limited
Date: 13-06-2017
Publisher: Wiley
Date: 09-2020
DOI: 10.1111/ANZS.12303
Publisher: Public Library of Science (PLoS)
Date: 05-2019
Publisher: Informa UK Limited
Date: 27-02-2019
Publisher: Wiley
Date: 06-2018
DOI: 10.1111/MEC.14718
Publisher: Wiley
Date: 10-07-2023
Abstract: We introduce community‐level basis function models (CBFMs) as an approach for spatiotemporal joint distribution modelling. CBFMs can be viewed as related to spatiotemporal latent variable models, where the latent variables are replaced by a set of pre‐specified spatiotemporal basis functions which are common across species. In a CBFM, the coefficients that link the basis functions to each species are treated as random slopes. As such, the CBFM can be formulated to have a similar structure to a generalised additive model. This allows us to adapt existing techniques to fit CBFMs efficiently. CBFMs can be used for a variety of reasons, such as inferring patterns of habitat use in space and time, understanding how residual covariation between species varies spatially and/or temporally, and spatiotemporal predictions of species‐ and community‐level quantities. A simulation study and an application to data from a bottom trawl survey conducted across the U.S. Northeast shelf show that CBFMs can achieve similar and sometimes better predictive performance compared to existing approaches for spatiotemporal joint species distribution modelling, while being computationally more scalable.
Publisher: Wiley
Date: 2011
DOI: 10.1890/10-0340.1
Abstract: The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Ex les are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
Publisher: Public Library of Science (PLoS)
Date: 24-05-2016
Publisher: Statistica Sinica (Institute of Statistical Science)
Date: 2017
Publisher: Wiley
Date: 16-01-2023
DOI: 10.1111/JBI.14565
Abstract: Freshwater mussels are among the most threatened taxa in the world, partially due to the dependence on fish hosts to complete their life cycle. Knowledge about the role of environmental and biotic drivers in determining mussels' distribution is currently lacking. We aimed to assess the role of environmental and biotic drivers in determining the distribution of mussels and their fish hosts and to test if co‐occurrence patterns were able to identify mussel‐host interactions. Douro River basin (Iberian Peninsula). Four freshwater mussels and ten fish hosts. Joint species distribution models (JSDMs) were fitted to presence‐absence records for mussel and fish assemblages. Variance partitioning among environmental variables and latent variables was conducted to determine the environmental versus biotic drivers of species distributions. Resulting matrices of pairwise species co‐occurrences were used to identify co‐occurrence patterns. The distribution of host generalist mussel species was mainly explained by environmental variables related to climate and topography. The distribution of the host specialist Margaritifera margaritifera was mainly explained by land use. Strong positive correlations between mussels and the most relevant fish hosts were consistently captured by JSDMs. Co‐occurrence patterns were mainly explained by residual factors, indicating the potential role of biotic interactions. Biotic interactions were expected to play an important role in explaining mussels' distribution, but the contribution of this factor was only meaningful for the host specialist M. margaritifera . Correlations between mussels and suitable hosts allowed to infer important fish hosts for freshwater mussels in the Douro River basin from distributional data alone. By finding similarities between the ecological requirements of co‐occurring species, conservation measures can be oriented towards several species, which brings a more holistic perspective to the protection of bio ersity.
Publisher: Informa UK Limited
Date: 19-06-2018
Publisher: Springer Science and Business Media LLC
Date: 23-12-2015
DOI: 10.1038/NATURE16476
Abstract: Phenotypic traits and their associated trade-offs have been shown to have globally consistent effects on in idual plant physiological functions, but how these effects scale up to influence competition, a key driver of community assembly in terrestrial vegetation, has remained unclear. Here we use growth data from more than 3 million trees in over 140,000 plots across the world to show how three key functional traits--wood density, specific leaf area and maximum height--consistently influence competitive interactions. Fast maximum growth of a species was correlated negatively with its wood density in all biomes, and positively with its specific leaf area in most biomes. Low wood density was also correlated with a low ability to tolerate competition and a low competitive effect on neighbours, while high specific leaf area was correlated with a low competitive effect. Thus, traits generate trade-offs between performance with competition versus performance without competition, a fundamental ingredient in the classical hypothesis that the coexistence of plant species is enabled via differentiation in their successional strategies. Competition within species was stronger than between species, but an increase in trait dissimilarity between species had little influence in weakening competition. No benefit of dissimilarity was detected for specific leaf area or wood density, and only a weak benefit for maximum height. Our trait-based approach to modelling competition makes generalization possible across the forest ecosystems of the world and their highly erse species composition.
Publisher: Wiley
Date: 12-06-2019
DOI: 10.1002/ECM.1370
Publisher: Elsevier BV
Date: 11-2020
Publisher: Informa UK Limited
Date: 28-04-2022
Publisher: Wiley
Date: 05-01-2016
Abstract: Model‐based methods have emerged as a powerful approach for analysing multivariate abundance data in community ecology. Key applications include model‐based ordination, modelling the various sources of correlations across species, and making inferences while accounting for these between species correlations. boral (version 0.9.1, licence GPL‐2) is an r package available on cran for model‐based analysis of multivariate abundance data, with estimation performed using Bayesian Markov chain Monte Carlo methods. A key feature of the boral package is the ability to incorporate latent variables as a parsimonious method of modelling between species correlation. Pure latent variable models offer a model‐based approach to unconstrained ordination, for visualizing sites and the indicator species characterizing them on a low‐dimensional plot. Correlated response models consist of fitting generalized linear models to each species, while including latent variables to account for residual correlation between species, for ex le, due to unmeasured covariates.
Publisher: Wiley
Date: 11-05-2018
DOI: 10.1111/BIOM.12888
Abstract: Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that non-zero loadings are concentrated on the lower-order factors. Simultaneously, in idual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates.
Publisher: Wiley
Date: 06-01-2021
DOI: 10.1111/BIOM.13416
Abstract: Multivariate spatial data, where multiple responses are simultaneously recorded across spatially indexed observational units, are routinely collected in a wide variety of disciplines. For ex le, the Southern Ocean Continuous Plankton Recorder survey collects records of zooplankton communities in the Indian sector of the Southern Ocean, with the aim of identifying and quantifying spatial patterns in bio ersity in response to environmental change. One increasingly popular method for modeling such data is spatial generalized linear latent variable models (GLLVMs), where the correlation across sites is captured by a spatial covariance function in the latent variables. However, little is known about the impact of misspecifying the latent variable correlation structure on inference of various parameters in such models. To address this gap in the literature, we investigate how misspecifying and assuming independence for the latent variables' correlation structure impacts estimation and inference in spatial GLLVMs. Through both theory and numerical studies, we show that performance of maximum likelihood estimation and inference on regression coefficients under misspecification depends on a combination of the response type, the magnitude of true regression coefficient, and the corresponding loadings, and, most importantly, whether the corresponding covariate is (also) spatially correlated. On the other hand, estimation and inference of truly nonzero loadings and prediction of latent variables is consistently not robust to misspecification of the latent variable correlation structure.
Publisher: Elsevier BV
Date: 10-2021
Publisher: Elsevier BV
Date: 12-2015
DOI: 10.1016/J.TREE.2015.09.007
Abstract: Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by ex le and discuss recent computation tools and future directions.
Publisher: Elsevier BV
Date: 12-2013
Publisher: Elsevier BV
Date: 07-2023
Start Date: 2023
End Date: 12-2025
Amount: $388,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 04-2018
End Date: 12-2021
Amount: $359,083.00
Funder: Australian Research Council
View Funded ActivityStart Date: 01-2020
End Date: 01-2024
Amount: $365,039.00
Funder: Australian Research Council
View Funded Activity