ORCID Profile
0000-0002-9958-432X
Current Organisations
La Trobe University
,
University of Queensland
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Applied Statistics | Statistical Theory | Statistics | Statistical data science | Optimisation | Computational statistics | Pattern Recognition and Data Mining | Statistics
Expanding Knowledge in the Mathematical Sciences | Expanding Knowledge in the Information and Computing Sciences |
Publisher: Elsevier BV
Date: 06-2018
Publisher: PeerJ
Date: 02-06-2021
DOI: 10.7717/PEERJ-CS.582
Abstract: Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of ‘fairness’. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category, which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert–Schmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a medical survey data set.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: MIT Press - Journals
Date: 12-2016
DOI: 10.1162/NECO_A_00892
Abstract: The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: Wiley
Date: 06-12-2017
DOI: 10.1002/SAM.11366
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2016
Publisher: Springer Singapore
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2016
Publisher: Springer Science and Business Media LLC
Date: 05-08-2019
Publisher: Hindawi Limited
Date: 26-02-2022
DOI: 10.1155/2022/3473397
Abstract: Walking, cycling, and feeder bus/tram for first- and last-mile (FLM) train access are often considered to have better health benefits, lower cost, and less environmental impacts than driving. However, little is known about the road safety impacts of these FLM access modes, particularly at a network level. This paper aims to investigate the impacts of train commuters’ access modes on road safety in Victoria, Australia. Macroscopic analyses of crash outcomes in each zone (i.e., Statistical Area Level 1) were performed using negative binomial (NB) and spatially lagged X negative binomial (SLXNB), accounting for potential indirect effects of mode shares in adjacent zones. This macroscopic analysis approach enabled the consideration of the safety effects across the network. The results showed that the SLXNB models outperformed the NB models. Commuting by train, either with walking or car as FLM access mode, was negatively associated with both total and severe crashes. In addition, commuting by train with feeder bus/tram access mode was negatively associated with severe crashes. Interestingly, commuting by train with cycling access mode was negatively associated with total crashes, with a larger effect when compared to walking and car access modes. Overall, the results suggested promoting active transport as FLM train access mode would lead to an improvement in road safety.
Publisher: Wiley
Date: 12-10-2016
DOI: 10.1111/STAN.12093
Publisher: Springer Science and Business Media LLC
Date: 03-2018
Publisher: Wiley
Date: 09-02-2017
DOI: 10.1002/WIDM.1198
Publisher: Elsevier BV
Date: 11-2019
Publisher: Informa UK Limited
Date: 20-11-2017
Publisher: Wiley
Date: 19-04-2023
DOI: 10.1111/SJOS.12643
Abstract: In recent years, empirical Bayesian (EB) inference has become an attractive approach for estimation in parametric models arising in a variety of real‐life problems, especially in complex and high‐dimensional scientific applications. However, compared to the relative abundance of available general methods for computing point estimators in the EB framework, the construction of confidence sets and hypothesis tests with good theoretical properties remains difficult and problem specific. Motivated by the Universal Inference framework, we propose a general and universal method, based on holdout likelihood ratios, and utilizing the hierarchical structure of the specified Bayesian model for constructing confidence sets and hypothesis tests that are finite s le valid. We illustrate our method through a range of numerical studies and real data applications, which demonstrate that the approach is able to generate useful and meaningful inferential statements in the relevant contexts.
Publisher: Elsevier BV
Date: 12-2014
Publisher: American Chemical Society (ACS)
Date: 30-07-2014
DOI: 10.1021/PR500525E
Abstract: The utility of high-throughput quantitative proteomics to identify differentially abundant proteins en-masse relies on suitable and accessible statistical methodology, which remains mostly an unmet need. We present a free web-based tool, called Quantitative Proteomics p-value Calculator (QPPC), designed for accessibility and usability by proteomics scientists and biologists. Being an online tool, there is no requirement for software installation. Furthermore, QPPC accepts generic peptide ratio data generated by any mass spectrometer and database search engine. Importantly, QPPC utilizes the permutation test that we recently found to be superior to other methods for analysis of peptide ratios because it does not assume normal distributions.1 QPPC assists the user in selecting significantly altered proteins based on numerical fold change, or standard deviation from the mean or median, together with the permutation p-value. Output is in the form of comma separated values files, along with graphical visualization using volcano plots and histograms. We evaluate the optimal parameters for use of QPPC, including the permutation level and the effect of outlier and contaminant peptides on p-value variability. The optimal parameters defined are deployed as default for the web-tool at qppc.di.uq.edu.au/ .
Publisher: Elsevier BV
Date: 12-2020
Publisher: Informa UK Limited
Date: 28-11-2016
Publisher: Springer New York
Date: 15-12-2016
DOI: 10.1007/978-1-4939-6740-7_9
Abstract: Comparative profiling proteomics experiments are important tools in biological research. In such experiments, tens to hundreds of thousands of peptides are measured simultaneously, with the goal of inferring protein abundance levels. Statistical evaluation of these datasets are required to determine proteins that are differentially abundant between the test s les. Previously we have reported the non-normal distribution of SILAC datasets, and demonstrated the permutation test to be a superior method for the statistical evaluation of non-normal peptide ratios. This chapter outlines the steps and the R scripts that can be used for performing permutation analysis with false discovery rate control via the Benjamini-Yekutieli method.
Publisher: Springer Singapore
Date: 2019
Publisher: Cold Spring Harbor Laboratory
Date: 23-05-2017
DOI: 10.1101/141192
Abstract: Our objective was to assess the generalizability, across sites and cognitive contexts, of schizophrenia classification based on functional brain connectivity. We tested different training-test scenarios combining fMRI data from 191 schizophrenia patients and 191 matched healthy controls obtained at 6 scanning sites and under different task conditions. Diagnosis classification accuracy generalized well to a novel site and cognitive context provided data from multiple sites were used for classifier training. By contrast, lower classification accuracy was achieved when data from a single distinct site was used for training. These findings indicate that it is beneficial to use multisite data to train fMRI-based classifiers intended for large-scale use in the clinical realm.
Publisher: Wiley
Date: 06-2022
DOI: 10.1111/ANZS.12372
Abstract: This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.
Publisher: Elsevier BV
Date: 2016
Publisher: Elsevier BV
Date: 03-2016
Publisher: EDP Sciences
Date: 2020
DOI: 10.1051/PS/2019018
Abstract: We investigate the sub-Gaussian property for almost surely bounded random variables. If sub-Gaussianity per se is de facto ensured by the bounded support of said random variables, then exciting research avenues remain open. Among these questions is how to characterize the optimal sub-Gaussian proxy variance? Another question is how to characterize strict sub-Gaussianity, defined by a proxy variance equal to the (standard) variance? We address the questions in proposing conditions based on the study of functions variations. A particular focus is given to the relationship between strict sub-Gaussianity and symmetry of the distribution. In particular, we demonstrate that symmetry is neither sufficient nor necessary for strict sub-Gaussianity. In contrast, simple necessary conditions on the one hand, and simple sufficient conditions on the other hand, for strict sub-Gaussianity are provided. These results are illustrated via various applications to a number of bounded random variables, including Bernoulli, beta, binomial, Kumaraswamy, triangular, and uniform distributions.
Publisher: eLife Sciences Publications, Ltd
Date: 30-08-2022
Publisher: Wiley
Date: 2020
DOI: 10.1002/STA4.318
Publisher: Informa UK Limited
Date: 26-05-2022
Publisher: Elsevier BV
Date: 2017
DOI: 10.2139/SSRN.2946964
Publisher: Elsevier BV
Date: 10-2018
Publisher: Cold Spring Harbor Laboratory
Date: 15-04-2020
DOI: 10.1101/2020.04.14.040576
Abstract: Our understanding of the changes in functional brain organization in autism is h ered by the extensive heterogeneity that characterizes this neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose this heterogeneity into subtypes of distinguishable connectivity types and promises an unbiased framework to investigate behavioural symptoms and causative genetic factors. Yet the robustness and generalizability of these imaging subtypes is unknown. Here, we show that unsupervised functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and that these associations generalize to independent replication data. We found that subtypes identified robust patterns of functional connectivity, but that a discrete assignment of in iduals to these subtypes was not supported by the data. Our results support the use of data driven subtyping as a data dimensionality reduction technique, rather than to establish clinical categories.
Publisher: IEEE
Date: 12-2020
Publisher: The Open Journal
Date: 21-08-2019
DOI: 10.21105/JOSS.01587
Publisher: Springer Singapore
Date: 2019
Publisher: IEEE
Date: 06-2012
Publisher: Cold Spring Harbor Laboratory
Date: 02-06-2020
DOI: 10.1101/2020.06.01.127688
Abstract: Functional connectivity (FC) analyses of in iduals with autism spectrum disorder (ASD) have established robust alterations of brain connectivity at the group level. Yet, the translation of these imaging findings into robust markers of in idual risk is h ered by the extensive heterogeneity among ASD in iduals. Here, we report an FC endophenotype that confers a greater than 7-fold risk increase of ASD diagnosis, yet is still identified in an estimated 1 in 200 in iduals in the general population. By focusing on a subset of in iduals with ASD and highly predictive FC alterations, we achieved a greater than 3-fold increase in risk over previous predictive models. The identified FC risk endophenotype was characterized by underconnectivity of transmodal brain networks and generalized to independent data. Our results demonstrate the ability of a highly targeted prediction model to meaningfully decompose part of the heterogeneity of the autism spectrum. The identified FC signature may help better delineate the multitude of etiological pathways and behavioural symptoms that challenge our understanding of the autism spectrum.
Publisher: Wiley
Date: 28-04-2016
DOI: 10.1111/BIOM.12531
Abstract: Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization-maximization (MM) algorithm with a Nelder-Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Publisher: Springer Singapore
Date: 2019
Publisher: IEEE
Date: 06-2014
Publisher: Wiley
Date: 2019
DOI: 10.1002/STA4.248
Publisher: Elsevier BV
Date: 02-2018
DOI: 10.1016/J.SCHRES.2017.05.027
Abstract: Our objective was to assess the generalizability, across sites and cognitive contexts, of schizophrenia classification based on functional brain connectivity. We tested different training-test scenarios combining fMRI data from 191 schizophrenia patients and 191 matched healthy controls obtained at 6 scanning sites and under different task conditions. Diagnosis classification accuracy generalized well to a novel site and cognitive context provided data from multiple sites were used for classifier training. By contrast, lower classification accuracy was achieved when data from a single distinct site was used for training. These findings indicate that it is beneficial to use multisite data to train fMRI-based classifiers intended for large-scale use in the clinical realm.
Publisher: Wiley
Date: 18-01-2019
DOI: 10.1002/WIDM.1298
Abstract: Complex data analysis is a central topic of modern statistics and learning systems which is becoming of broader interest with the increasing prevalence of high‐dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to discern knowledge from raw data, which can be achieved through clustering techniques, or to make predictions of future data via classification techniques. Latent data models, including mixture model‐based approaches, are among the most popular and successful approaches in both supervised and unsupervised learning. Although being traditional tools in multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which each datum is a function, rather than a real vector. In many areas of application, including signal and image processing, functional imaging, bioinformatics, etc., the analyzed data are indeed often available in the form of discretized values of functions, curves, or surfaces. This functional aspect of the data adds additional difficulties when compared to classical multivariate data analysis. We review and present approaches for model‐based clustering and classification of functional data. We present well‐grounded statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these functional data, including their heterogeneity, missing information, and dynamical hidden structures. The presented models and algorithms are illustrated via real‐world functional data analysis problems from several areas of application. This article is categorized under: Fundamental Concepts of Data and Knowledge Data Concepts Algorithmic Development Statistics Technologies Structure Discovery and Clustering
Publisher: Frontiers Media SA
Date: 03-12-2020
Abstract: The coefficient of determination, the R 2 , is often used to measure the variance explained by an affine combination of multiple explanatory covariates. An attribution of this explanatory contribution to each of the in idual covariates is often sought in order to draw inference regarding the importance of each covariate with respect to the response phenomenon. A recent method for ascertaining such an attribution is via the game theoretic Shapley value decomposition of the coefficient of determination. Such a decomposition has the desirable efficiency, monotonicity, and equal treatment properties. Under a weak assumption that the joint distribution is pseudo-elliptical, we obtain the asymptotic normality of the Shapley values. We then utilize this result in order to construct confidence intervals and hypothesis tests for Shapley values. Monte Carlo studies regarding our results are provided. We found that our asymptotic confidence intervals required less computational time to competing bootstrap methods and are able to exhibit improved coverage, especially on small s les. In an expository application to Australian real estate price modeling, we employ Shapley value confidence intervals to identify significant differences between the explanatory contributions of covariates, between models, which otherwise share approximately the same R 2 value. These different models are based on real estate data from the same periods in 2019 and 2020, the latter covering the early stages of the arrival of the novel coronavirus, COVID-19.
Publisher: IEEE
Date: 11-2013
Publisher: Elsevier BV
Date: 10-2016
Publisher: Elsevier BV
Date: 10-2019
DOI: 10.1016/J.AAP.2019.07.004
Abstract: Pedestrian deaths and injuries are a major health issue in both developed and developing countries. In Vietnam, pedestrians account for about 10-11% of all road traffic deaths, while their travel distance contributes to approximately 2.4% of the total distance travelled by all modes. This paper aims to explore the use of pedestrian overpasses and identify influencing factors, particularly with regards to social and digital distractions, and overpass characteristics. An observational survey was conducted in Hanoi, Vietnam, in March 2017 at ten pedestrian overpasses. Behaviours of 608 pedestrians, including those who used an overpass to cross and those who illegally crossed, were observed. The rates of overpass usage varied significantly, between 35.9% and 96.5%. Modelling results suggest that pedestrians tended to compensate for the risks of illegal crossing by forming group and avoiding digital and social distractions (i.e., calling, operating a mobile phone's screen, listening to music, or talking to other pedestrians while crossing). In addition, overpass usage decreased with taller overpasses, but increased with wider overpasses. Effects of gender, weather, and illegal crossing speed on overpass use were also discussed.
Publisher: Elsevier BV
Date: 09-2017
Publisher: Canadian Science Publishing
Date: 09-2014
Abstract: We derive a new method for determining size-transition matrices (STMs) that eliminates probabilities of negative growth and accounts for in idual variability. STMs are an important part of size-structured models, which are used in the stock assessment of aquatic species. The elements of STMs represent the probability of growth from one size class to another, given a time step. The growth increment over this time step can be modelled with a variety of methods, but when a population construct is assumed for the underlying growth model, the resulting STM may contain entries that predict negative growth. To solve this problem, we use a maximum likelihood method that incorporates in idual variability in the asymptotic length, relative age at tagging, and measurement error to obtain von Bertalanffy growth model parameter estimates. The statistical moments for the future length given an in idual’s previous length measurement and time at liberty are then derived. We moment match the true conditional distributions with skewed-normal distributions and use these to accurately estimate the elements of the STMs. The method is investigated with simulated tag–recapture data and tag–recapture data gathered from the Australian eastern king prawn (Melicertus plebejus).
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2014
Publisher: Elsevier BV
Date: 11-2016
Publisher: Springer Science and Business Media LLC
Date: 10-01-2020
Publisher: Springer Science and Business Media LLC
Date: 05-06-2015
Publisher: Elsevier BV
Date: 2016
Publisher: Elsevier BV
Date: 02-2012
Publisher: Elsevier BV
Date: 03-2018
Publisher: Elsevier BV
Date: 2018
DOI: 10.2139/SSRN.3287903
Publisher: MIT Press - Journals
Date: 03-2016
DOI: 10.1162/NECO_A_00813
Abstract: Maximum pseudo-likelihood estimation (MPLE) is an attractive method for training fully visible Boltzmann machines (FVBMs) due to its computational scalability and the desirable statistical properties of the MPLE. No published algorithms for MPLE have been proven to be convergent or monotonic. In this note, we present an algorithm for the MPLE of FVBMs based on the block successive lower-bound maximization (BSLM) principle. We show that the BSLM algorithm monotonically increases the pseudo-likelihood values and that the sequence of BSLM estimates converges to the unique global maximizer of the pseudo-likelihood function. The relationship between the BSLM algorithm and the gradient ascent (GA) algorithm for MPLE of FVBMs is also discussed, and a convergence criterion for the GA algorithm is given.
Publisher: MIT Press - Journals
Date: 04-2017
DOI: 10.1162/NECO_A_00938
Abstract: Mixture of autoregressions (MoAR) models provide a model-based approach to the clustering of time series data. The maximum likelihood (ML) estimation of MoAR models requires evaluating products of large numbers of densities of normal random variables. In practical scenarios, these products converge to zero as the length of the time series increases, and thus the ML estimation of MoAR models becomes infeasible without the use of numerical tricks. We propose a maximum pseudolikelihood (MPL) estimation approach as an alternative to the use of numerical tricks. The MPL estimator is proved to be consistent and can be computed with an EM (expectation-maximization) algorithm. Simulations are used to assess the performance of the MPL estimator against that of the ML estimator in cases where the latter was able to be calculated. An application to the clustering of time series data arising from a resting state fMRI experiment is presented as a demonstration of the methodology.
Publisher: Proteomass Scientific Society
Date: 31-12-2012
Publisher: Informa UK Limited
Date: 2020
Publisher: eLife Sciences Publications, Ltd
Date: 29-11-2022
DOI: 10.7554/ELIFE.56257
Abstract: Our understanding of the changes in functional brain organization in autism is h ered by the extensive heterogeneity that characterizes this neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose autism heterogeneity into subtypes of connectivity and promises an unbiased framework to investigate behavioral symptoms and causative genetic factors. Yet, the robustness and generalizability of functional connectivity subtypes is unknown. Here, we show that a simple hierarchical cluster analysis can robustly relate a given in idual and brain network to a connectivity subtype, but that continuous assignments are more robust than discrete ones. We also found that functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and these associations generalize to independent replication data. We explored systematically 18 different brain networks as we expected them to associate with different behavioral profiles as well as different key regions. Contrary to this prediction, autism functional connectivity subtypes converged on a common topography across different networks, consistent with a compression of the primary gradient of functional brain organization, as previously reported in the literature. Our results support the use of data driven clustering as a reliable data dimensionality reduction technique, where any given dimension only associates moderately with clinical manifestations.
Publisher: Elsevier BV
Date: 10-2022
DOI: 10.1016/J.WATRES.2022.119182
Abstract: Consumption of hetamine and meth hetamine, two common illicit drugs, has been monitored by wastewater-based epidemiology (WBE) in many countries over the past decade. There is potential for the estimated amount of hetamine used to be skewed at locations where meth hetamine is also consumed, because hetamine is also excreted to wastewater following meth hetamine consumption. The present study aims to review the available data in the literature to identify an average ratio of hetamine/meth hetamine (AMP/METH) that is excreted to wastewater after meth hetamine consumption. This ratio could then be used to refine the estimation of hetamine consumption in catchments where there is both hetamine and meth hetamine use. Using data from more than 6000 wastewater s les from Australia where meth hetamine is the dominant illicit hetamine-type substance on the market, we were able to subtract the contribution of legal sources of hetamine contribution and obtain the median AMP/METH ratio in wastewater of 0.09. Using this value, the hetamine derived from meth hetamine consumption can be calculated and subtracted from the total hetamine mass loads in wastewater s les. Without considering the contribution of hetamine from meth hetamine use, selected European catchments with comparable consumption of hetamine and meth hetamine showed up to 83% overestimation of hetamine use. For catchments with AMP/METH ratio greater than 1.00, the impact of hetamine from meth hetamine would be negligible for catchments with AMP/METH ratio in the range of 0.04-0.19, it will be difficult to accurately estimate hetamine consumption.
Publisher: Wiley
Date: 13-02-2018
DOI: 10.1002/WIDM.1246
Publisher: Springer Science and Business Media LLC
Date: 06-08-2021
DOI: 10.1186/S40488-021-00125-0
Abstract: Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is univariate. Auxiliary lemmas are proved regarding the richness of the soft-max gating function class, and their relationships to the class of Gaussian gating functions.
Publisher: Informa UK Limited
Date: 30-10-2018
Start Date: 04-2017
End Date: 12-2021
Amount: $360,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 09-2018
End Date: 09-2022
Amount: $342,194.00
Funder: Australian Research Council
View Funded ActivityStart Date: 07-2023
End Date: 07-2026
Amount: $360,000.00
Funder: Australian Research Council
View Funded Activity