ARDC Research Link Australia

Publication

Chunked-and-averaged estimators for vector parameters

Publisher: Elsevier BV

Date: 06-2018

DOI: 10.1016/J.SPL.2018.02.051

Publication

Model independent feature attributions: Shapley values that uncover non-linear dependencies

Publisher: PeerJ

Date: 02-06-2021

DOI: 10.7717/PEERJ-CS.582

Abstract: Shapley values have become increasingly popular in the machine learning literature, thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of ‘fairness’. The flexibility arises from the myriad potential forms of the Shapley value game formulation. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category, which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert–Schmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a medical survey data set.

Publication

Shapley Values for Feature Selection: The Good, the Bad, and the Axioms

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2021

DOI: 10.1109/ACCESS.2021.3119110

Publication

A universal approximation theorem formixture-of-experts models

Publisher: MIT Press - Journals

Date: 12-2016

DOI: 10.1162/NECO_A_00892

Abstract: The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes.

Publication

Approximate Bayesian Computation Via the Energy Statistic

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2020

DOI: 10.1109/ACCESS.2020.3009878

Publication

Whole‐volume clustering of time series data from zebrafish brain calcium images via mixture modeling

Publisher: Wiley

Date: 06-12-2017

DOI: 10.1002/SAM.11366

Publication

Asymptotic Normality of the Maximum Pseudolikelihood Estimator for Fully Visible Boltzmann Machines

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 04-2016

DOI: 10.1109/TNNLS.2015.2425898

Publication

A Two-Sample Kolmogorov-Smirnov-Like Test for Big Data

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-13-0292-3_6

Publication

A Block Minorization-Maximization Algorithm for Heteroscedastic Regression

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2016

DOI: 10.1109/LSP.2016.2586180

Publication

The fully visible Boltzmann machine and the Senate of the 45th Australian Parliament in 2016

Publisher: Springer Science and Business Media LLC

Date: 05-08-2019

DOI: 10.1007/S42001-019-00055-7

Publication

Modelling the Relationships between Train Commuters’ Access Modes and Traffic Safety

Publisher: Hindawi Limited

Date: 26-02-2022

DOI: 10.1155/2022/3473397

Abstract: Walking, cycling, and feeder bus/tram for first- and last-mile (FLM) train access are often considered to have better health benefits, lower cost, and less environmental impacts than driving. However, little is known about the road safety impacts of these FLM access modes, particularly at a network level. This paper aims to investigate the impacts of train commuters’ access modes on road safety in Victoria, Australia. Macroscopic analyses of crash outcomes in each zone (i.e., Statistical Area Level 1) were performed using negative binomial (NB) and spatially lagged X negative binomial (SLXNB), accounting for potential indirect effects of mode shares in adjacent zones. This macroscopic analysis approach enabled the consideration of the safety effects across the network. The results showed that the SLXNB models outperformed the NB models. Commuting by train, either with walking or car as FLM access mode, was negatively associated with both total and severe crashes. In addition, commuting by train with feeder bus/tram access mode was negatively associated with severe crashes. Interestingly, commuting by train with cycling access mode was negatively associated with total crashes, with a larger effect when compared to walking and car access modes. Overall, the results suggested promoting active transport as FLM train access mode would lead to an improvement in road safety.

Publication

Spatial clustering of time series via mixture of autoregressions models and Markov random fields

Publisher: Wiley

Date: 12-10-2016

DOI: 10.1111/STAN.12093

Publication

Near universal consistency of the maximum pseudolikelihood estimator for discrete models

Publisher: Springer Science and Business Media LLC

Date: 03-2018

DOI: 10.1016/J.JKSS.2017.10.001

Publication

An introduction to Majorization‐Minimization algorithms for machine learning and statistical estimation

Publisher: Wiley

Date: 09-02-2017

DOI: 10.1002/WIDM.1198

Publication

Approximation results regarding the multiple-output Gaussian gated mixture of linear experts model

Publisher: Elsevier BV

Date: 11-2019

DOI: 10.1016/J.NEUCOM.2019.08.014

Publication

Some theoretical results regarding the polygonal distribution

Publisher: Informa UK Limited

Date: 20-11-2017

DOI: 10.1080/03610926.2017.1386312

Publication

Finite sample inference for empirical Bayesian methods

Publisher: Wiley

Date: 19-04-2023

DOI: 10.1111/SJOS.12643

Abstract: In recent years, empirical Bayesian (EB) inference has become an attractive approach for estimation in parametric models arising in a variety of real‐life problems, especially in complex and high‐dimensional scientific applications. However, compared to the relative abundance of available general methods for computing point estimators in the EB framework, the construction of confidence sets and hypothesis tests with good theoretical properties remains difficult and problem specific. Motivated by the Universal Inference framework, we propose a general and universal method, based on holdout likelihood ratios, and utilizing the hierarchical structure of the specified Bayesian model for constructing confidence sets and hypothesis tests that are finite s le valid. We illustrate our method through a range of numerical studies and real data applications, which demonstrate that the approach is able to generate useful and meaningful inferential statements in the relevant contexts.

Publication

New palladium(II) complex of SCN unsymmetric pincer-type ligand via oxidative addition

Publisher: Elsevier BV

Date: 12-2014

DOI: 10.1016/J.JORGANCHEM.2014.09.017

Publication

Online quantitative proteomics p -value calculator for permutation-based statistical testing of peptide ratios

Publisher: American Chemical Society (ACS)

Date: 30-07-2014

DOI: 10.1021/PR500525E

Abstract: The utility of high-throughput quantitative proteomics to identify differentially abundant proteins en-masse relies on suitable and accessible statistical methodology, which remains mostly an unmet need. We present a free web-based tool, called Quantitative Proteomics p-value Calculator (QPPC), designed for accessibility and usability by proteomics scientists and biologists. Being an online tool, there is no requirement for software installation. Furthermore, QPPC accepts generic peptide ratio data generated by any mass spectrometer and database search engine. Importantly, QPPC utilizes the permutation test that we recently found to be superior to other methods for analysis of peptide ratios because it does not assume normal distributions.1 QPPC assists the user in selecting significantly altered proteins based on numerical fold change, or standard deviation from the mean or median, together with the permutation p-value. Output is in the form of comma separated values files, along with graphical visualization using volcano plots and histograms. We evaluate the optimal parameters for use of QPPC, including the permutation level and the effect of outlier and contaminant peptides on p-value variability. The optimal parameters defined are deployed as default for the web-tool at qppc.di.uq.edu.au/ .

Publication

Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions

Publisher: Elsevier BV

Date: 12-2020

DOI: 10.1016/J.CSDA.2020.107040

Publication

Progress on a conjecture regarding the triangular distribution

Publisher: Informa UK Limited

Date: 28-11-2016

DOI: 10.1080/03610926.2016.1263742

Publication

Statistical Evaluation of Labeled Comparative Profiling Proteomics Experiments Using Permutation Test

Publisher: Springer New York

Date: 15-12-2016

DOI: 10.1007/978-1-4939-6740-7_9

Abstract: Comparative profiling proteomics experiments are important tools in biological research. In such experiments, tens to hundreds of thousands of peptides are measured simultaneously, with the goal of inferring protein abundance levels. Statistical evaluation of these datasets are required to determine proteins that are differentially abundant between the test s les. Previously we have reported the non-normal distribution of SILAC datasets, and demonstrated the permutation test to be a superior method for the statistical evaluation of non-normal peptide ratios. This chapter outlines the steps and the R scripts that can be used for performing permutation analysis with false discovery rate control via the Benjamini-Yekutieli method.

Publication

An Introduction to Approximate Bayesian Computation

Publisher: Springer Singapore

Date: 2019

DOI: 10.1007/978-981-15-1960-4_7

Publication

Multisite generalizability of schizophrenia diagnosis classification based on functional brain connectivity

Publisher: Cold Spring Harbor Laboratory

Date: 23-05-2017

DOI: 10.1101/141192

Abstract: Our objective was to assess the generalizability, across sites and cognitive contexts, of schizophrenia classification based on functional brain connectivity. We tested different training-test scenarios combining fMRI data from 191 schizophrenia patients and 191 matched healthy controls obtained at 6 scanning sites and under different task conditions. Diagnosis classification accuracy generalized well to a novel site and cognitive context provided data from multiple sites were used for classifier training. By contrast, lower classification accuracy was achieved when data from a single distinct site was used for training. These findings indicate that it is beneficial to use multisite data to train fMRI-based classifiers intended for large-scale use in the clinical realm.

Publication

A Festschrift for Geoff McLachlan

Publisher: Wiley

Date: 06-2022

DOI: 10.1111/ANZS.12372

Abstract: This article introduces a special issue of the Australian and New Zealand Journal of Statistics, dedicated as a Festschrift for Geoff McLachlan on the occasion of his 75th birthday.

Publication

Laplace mixture of linear experts

Publisher: Elsevier BV

Date: 2016

DOI: 10.1016/J.CSDA.2014.10.016

Publication

Laplace mixture autoregressive models

Publisher: Elsevier BV

Date: 03-2016

DOI: 10.1016/J.SPL.2015.11.006

Publication

On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables

Publisher: EDP Sciences

Date: 2020

DOI: 10.1051/PS/2019018

Abstract: We investigate the sub-Gaussian property for almost surely bounded random variables. If sub-Gaussianity per se is de facto ensured by the bounded support of said random variables, then exciting research avenues remain open. Among these questions is how to characterize the optimal sub-Gaussian proxy variance? Another question is how to characterize strict sub-Gaussianity, defined by a proxy variance equal to the (standard) variance? We address the questions in proposing conditions based on the study of functions variations. A particular focus is given to the relationship between strict sub-Gaussianity and symmetry of the distribution. In particular, we demonstrate that symmetry is neither sufficient nor necessary for strict sub-Gaussianity. In contrast, simple necessary conditions on the one hand, and simple sufficient conditions on the other hand, for strict sub-Gaussianity are provided. These results are illustrated via various applications to a number of bounded random variables, including Bernoulli, beta, binomial, Kumaraswamy, triangular, and uniform distributions.

Publication

Author response: Functional connectivity subtypes associate robustly with ASD diagnosis

Publisher: eLife Sciences Publications, Ltd

Date: 30-08-2022

DOI: 10.7554/ELIFE.56257.SA2

Publication

Sub‐Weibull distributions: Generalizing sub‐Gaussian and sub‐Exponential properties to heavier tailed distributions

Publisher: Wiley

Date: 2020

DOI: 10.1002/STA4.318

Publication

Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces

Publisher: Informa UK Limited

Date: 26-05-2022

DOI: 10.1080/03610926.2021.2002360

Publication

A Universal Approximation Theorem for Gaussian-Gated Mixture of Experts Models

Publisher: Elsevier BV

Date: 2017

DOI: 10.2139/SSRN.2946964

Publication

Randomized mixture models for probability density approximation and estimation

Publisher: Elsevier BV

Date: 10-2018

DOI: 10.1016/J.INS.2018.07.056

Publication

Subtypes of functional connectivity associate robustly with ASD diagnosis

Publisher: Cold Spring Harbor Laboratory

Date: 15-04-2020

DOI: 10.1101/2020.04.14.040576

Abstract: Our understanding of the changes in functional brain organization in autism is h ered by the extensive heterogeneity that characterizes this neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose this heterogeneity into subtypes of distinguishable connectivity types and promises an unbiased framework to investigate behavioural symptoms and causative genetic factors. Yet the robustness and generalizability of these imaging subtypes is unknown. Here, we show that unsupervised functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and that these associations generalize to independent replication data. We found that subtypes identified robust patterns of functional connectivity, but that a discrete assignment of in iduals to these subtypes was not supported by the data. Our results support the use of data driven subtyping as a data dimensionality reduction technique, rather than to establish clinical categories.

Publication

k-means on Positive Definite Matrices, and an Application to Clustering in Radar Image Sequences

Publisher: IEEE

Date: 12-2020

DOI: 10.1109/SSCI47803.2020.9308185

Publication

studentlife: Tidy Handling and Navigation of a Valuable Mobile-Health Dataset

Publisher: The Open Journal

Date: 21-08-2019

DOI: 10.21105/JOSS.01587

Publication

Regularized estimation and feature selection in mixtures of Gaussian-gated experts models

Publisher: Springer Singapore

Date: 2019

DOI: 10.1007/978-981-15-1960-4_3

Publication

Variable selection in statistical models using population-based incremental learning with applications to genome-wide association studies

Publisher: IEEE

Date: 06-2012

DOI: 10.1109/CEC.2012.6256577

Publication

Reproducible functional connectivity endophenotype confers high risk of ASD diagnosis in a subset of individuals

Publisher: Cold Spring Harbor Laboratory

Date: 02-06-2020

DOI: 10.1101/2020.06.01.127688

Abstract: Functional connectivity (FC) analyses of in iduals with autism spectrum disorder (ASD) have established robust alterations of brain connectivity at the group level. Yet, the translation of these imaging findings into robust markers of in idual risk is h ered by the extensive heterogeneity among ASD in iduals. Here, we report an FC endophenotype that confers a greater than 7-fold risk increase of ASD diagnosis, yet is still identified in an estimated 1 in 200 in iduals in the general population. By focusing on a subset of in iduals with ASD and highly predictive FC alterations, we achieved a greater than 3-fold increase in risk over previous predictive models. The identified FC risk endophenotype was characterized by underconnectivity of transmodal brain networks and generalized to independent data. Our results demonstrate the ability of a highly targeted prediction model to meaningfully decompose part of the heterogeneity of the autism spectrum. The identified FC signature may help better delineate the multitude of etiological pathways and behavioural symptoms that challenge our understanding of the autism spectrum.

Publication

Mixture of time-dependent growth models with an application to blue swimmer crab length-frequency data

Publisher: Wiley

Date: 28-04-2016

DOI: 10.1111/BIOM.12531

Abstract: Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization-maximization (MM) algorithm with a Nelder-Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.

Publication

Positive Data Kernel Density Estimation via the LogKDE Package for R

Publisher: Springer Singapore

Date: 2019

DOI: 10.1007/978-981-13-6661-1_21

Publication

Asymptotic inference for hidden process regression models

Publisher: IEEE

Date: 06-2014

DOI: 10.1109/SSP.2014.6884624

Publication

Asymptotic normality of the time‐domain generalized least squares estimator for linear regression models

Publisher: Wiley

Date: 2019

DOI: 10.1002/STA4.248

Publication

Multisite generalizability of schizophrenia diagnosis classification based on functional brain connectivity

Publisher: Elsevier BV

Date: 02-2018

DOI: 10.1016/J.SCHRES.2017.05.027

Abstract: Our objective was to assess the generalizability, across sites and cognitive contexts, of schizophrenia classification based on functional brain connectivity. We tested different training-test scenarios combining fMRI data from 191 schizophrenia patients and 191 matched healthy controls obtained at 6 scanning sites and under different task conditions. Diagnosis classification accuracy generalized well to a novel site and cognitive context provided data from multiple sites were used for classifier training. By contrast, lower classification accuracy was achieved when data from a single distinct site was used for training. These findings indicate that it is beneficial to use multisite data to train fMRI-based classifiers intended for large-scale use in the clinical realm.

Publication

Model-based clustering and classification of functional data

Publisher: Wiley

Date: 18-01-2019

DOI: 10.1002/WIDM.1298

Abstract: Complex data analysis is a central topic of modern statistics and learning systems which is becoming of broader interest with the increasing prevalence of high‐dimensional data. The challenge is to develop statistical models and autonomous algorithms that are able to discern knowledge from raw data, which can be achieved through clustering techniques, or to make predictions of future data via classification techniques. Latent data models, including mixture model‐based approaches, are among the most popular and successful approaches in both supervised and unsupervised learning. Although being traditional tools in multivariate analysis, they are growing in popularity when considered in the framework of functional data analysis (FDA). FDA is the data analysis paradigm in which each datum is a function, rather than a real vector. In many areas of application, including signal and image processing, functional imaging, bioinformatics, etc., the analyzed data are indeed often available in the form of discretized values of functions, curves, or surfaces. This functional aspect of the data adds additional difficulties when compared to classical multivariate data analysis. We review and present approaches for model‐based clustering and classification of functional data. We present well‐grounded statistical models along with efficient algorithmic tools to address problems regarding the clustering and the classification of these functional data, including their heterogeneity, missing information, and dynamical hidden structures. The presented models and algorithms are illustrated via real‐world functional data analysis problems from several areas of application. This article is categorized under: Fundamental Concepts of Data and Knowledge Data Concepts Algorithmic Development Statistics Technologies Structure Discovery and Clustering

Publication

Shapley Value Confidence Intervals for Attributing Variance Explained

Publisher: Frontiers Media SA

Date: 03-12-2020

DOI: 10.3389/FAMS.2020.587199

Abstract: The coefficient of determination, the R 2 , is often used to measure the variance explained by an affine combination of multiple explanatory covariates. An attribution of this explanatory contribution to each of the in idual covariates is often sought in order to draw inference regarding the importance of each covariate with respect to the response phenomenon. A recent method for ascertaining such an attribution is via the game theoretic Shapley value decomposition of the coefficient of determination. Such a decomposition has the desirable efficiency, monotonicity, and equal treatment properties. Under a weak assumption that the joint distribution is pseudo-elliptical, we obtain the asymptotic normality of the Shapley values. We then utilize this result in order to construct confidence intervals and hypothesis tests for Shapley values. Monte Carlo studies regarding our results are provided. We found that our asymptotic confidence intervals required less computational time to competing bootstrap methods and are able to exhibit improved coverage, especially on small s les. In an expository application to Australian real estate price modeling, we employ Shapley value confidence intervals to identify significant differences between the explanatory contributions of covariates, between models, which otherwise share approximately the same R 2 value. These different models are based on real estate data from the same periods in 2019 and 2020, the latter covering the early stages of the arrival of the novel coronavirus, COVID-19.

Publication

Spatial false discovery rate control for magnetic resonance imaging studies

Publisher: IEEE

Date: 11-2013

DOI: 10.1109/DICTA.2013.6691531

Publication

Maximum likelihood estimation of triangular and polygonal distributions

Publisher: Elsevier BV

Date: 10-2016

DOI: 10.1016/J.CSDA.2016.04.003

Publication

Pedestrian overpass use and its relationships with digital and social distractions, and overpass characteristics

Publisher: Elsevier BV

Date: 10-2019

DOI: 10.1016/J.AAP.2019.07.004

Abstract: Pedestrian deaths and injuries are a major health issue in both developed and developing countries. In Vietnam, pedestrians account for about 10-11% of all road traffic deaths, while their travel distance contributes to approximately 2.4% of the total distance travelled by all modes. This paper aims to explore the use of pedestrian overpasses and identify influencing factors, particularly with regards to social and digital distractions, and overpass characteristics. An observational survey was conducted in Hanoi, Vietnam, in March 2017 at ten pedestrian overpasses. Behaviours of 608 pedestrians, including those who used an overpass to cross and those who illegally crossed, were observed. The rates of overpass usage varied significantly, between 35.9% and 96.5%. Modelling results suggest that pedestrians tended to compensate for the risks of illegal crossing by forming group and avoiding digital and social distractions (i.e., calling, operating a mobile phone's screen, listening to music, or talking to other pedestrians while crossing). In addition, overpass usage decreased with taller overpasses, but increased with wider overpasses. Effects of gender, weather, and illegal crossing speed on overpass use were also discussed.

Publication

Response functions

Publisher: Elsevier BV

Date: 09-2017

DOI: 10.1016/J.EUROECOREV.2017.06.011

Publication

Improved estimation of size-transition matrices using tag-recapture data

Publisher: Canadian Science Publishing

Date: 09-2014

DOI: 10.1139/CJFAS-2014-0080

Abstract: We derive a new method for determining size-transition matrices (STMs) that eliminates probabilities of negative growth and accounts for in idual variability. STMs are an important part of size-structured models, which are used in the stock assessment of aquatic species. The elements of STMs represent the probability of growth from one size class to another, given a time step. The growth increment over this time step can be modelled with a variety of methods, but when a population construct is assumed for the underlying growth model, the resulting STM may contain entries that predict negative growth. To solve this problem, we use a maximum likelihood method that incorporates in idual variability in the asymptotic length, relative age at tagging, and measurement error to obtain von Bertalanffy growth model parameter estimates. The statistical moments for the future length given an in idual’s previous length measurement and time at liberty are then derived. We moment match the true conditional distributions with skewed-normal distributions and use these to accurately estimate the elements of the STMs. The method is investigated with simulated tag–recapture data and tag–recapture data gathered from the Australian eastern king prawn (Melicertus plebejus).

Publication

False discovery rate control in magnetic resonance imaging studies via markov random fields

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2014

DOI: 10.1109/TMI.2014.2322369

Publication

Linear mixed models with marginally symmetric nonparametric random effects

Publisher: Elsevier BV

Date: 11-2016

DOI: 10.1016/J.CSDA.2016.05.005

Publication

Mini-batch learning of exponential family finite mixture models

Publisher: Springer Science and Business Media LLC

Date: 10-01-2020

DOI: 10.1007/S11222-019-09919-4

Publication

Maximum likelihood estimation of Gaussian mixture models without matrix operations

Publisher: Springer Science and Business Media LLC

Date: 05-06-2015

DOI: 10.1007/S11634-015-0209-7

Publication

Mixtures of spatial spline regressions for clustering and classification

Publisher: Elsevier BV

Date: 2016

DOI: 10.1016/J.CSDA.2014.01.011

Publication

Expression of PTRF in PC-3 cells modulates cholesterol dynamics and the actin cytoskeleton impacting secretion pathways

Publisher: Elsevier BV

Date: 02-2012

DOI: 10.1074/MCP.M111.012245

Publication

A globally convergent algorithm for lasso-penalized mixture of linear regression models

Publisher: Elsevier BV

Date: 03-2018

DOI: 10.1016/J.CSDA.2017.09.003

Publication

The Fully-Visible Boltzmann Machine and the Senate of the 45th Australian Parliament in 2016

Publisher: Elsevier BV

Date: 2018

DOI: 10.2139/SSRN.3287903

Publication

A block successive lower-bound maximization algorithm for the maximum pseudo-likelihood estimation of fully visible Boltzmann machines

Publisher: MIT Press - Journals

Date: 03-2016

DOI: 10.1162/NECO_A_00813

Abstract: Maximum pseudo-likelihood estimation (MPLE) is an attractive method for training fully visible Boltzmann machines (FVBMs) due to its computational scalability and the desirable statistical properties of the MPLE. No published algorithms for MPLE have been proven to be convergent or monotonic. In this note, we present an algorithm for the MPLE of FVBMs based on the block successive lower-bound maximization (BSLM) principle. We show that the BSLM algorithm monotonically increases the pseudo-likelihood values and that the sequence of BSLM estimates converges to the unique global maximizer of the pseudo-likelihood function. The relationship between the BSLM algorithm and the gradient ascent (GA) algorithm for MPLE of FVBMs is also discussed, and a convergence criterion for the GA algorithm is given.

Publication

Maximum Pseudolikelihood Estimation for Model-Based Clustering of Time Series Data

Publisher: MIT Press - Journals

Date: 04-2017

DOI: 10.1162/NECO_A_00938

Abstract: Mixture of autoregressions (MoAR) models provide a model-based approach to the clustering of time series data. The maximum likelihood (ML) estimation of MoAR models requires evaluating products of large numbers of densities of normal random variables. In practical scenarios, these products converge to zero as the length of the time series increases, and thus the ML estimation of MoAR models becomes infeasible without the use of numerical tricks. We propose a maximum pseudolikelihood (MPL) estimation approach as an alternative to the use of numerical tricks. The MPL estimator is proved to be consistent and can be computed with an EM (expectation-maximization) algorithm. Simulations are used to assess the performance of the MPL estimator against that of the ML estimator in cases where the latter was able to be calculated. An application to the clustering of time series data arising from a resting state fMRI experiment is presented as a demonstration of the methodology.

Publication

A robust permutation test for quantitative SILAC proteomics experiments

Publisher: Proteomass Scientific Society

Date: 31-12-2012

DOI: 10.5584/JIOMICS.V2I2.109

Publication

Approximation by finite mixtures of continuous density functions that vanish at infinity

Publisher: Informa UK Limited

Date: 2020

DOI: 10.1080/25742558.2020.1750861

Publication

Functional connectivity subtypes associate robustly with ASD diagnosis

Publisher: eLife Sciences Publications, Ltd

Date: 29-11-2022

DOI: 10.7554/ELIFE.56257

Abstract: Our understanding of the changes in functional brain organization in autism is h ered by the extensive heterogeneity that characterizes this neurodevelopmental disorder. Data driven clustering offers a straightforward way to decompose autism heterogeneity into subtypes of connectivity and promises an unbiased framework to investigate behavioral symptoms and causative genetic factors. Yet, the robustness and generalizability of functional connectivity subtypes is unknown. Here, we show that a simple hierarchical cluster analysis can robustly relate a given in idual and brain network to a connectivity subtype, but that continuous assignments are more robust than discrete ones. We also found that functional connectivity subtypes are moderately associated with the clinical diagnosis of autism, and these associations generalize to independent replication data. We explored systematically 18 different brain networks as we expected them to associate with different behavioral profiles as well as different key regions. Contrary to this prediction, autism functional connectivity subtypes converged on a common topography across different networks, consistent with a compression of the primary gradient of functional brain organization, as previously reported in the literature. Our results support the use of data driven clustering as a reliable data dimensionality reduction technique, where any given dimension only associates moderately with clinical manifestations.

Publication

Refining the estimation of amphetamine consumption by wastewater-based epidemiology

Publisher: Elsevier BV

Date: 10-2022

DOI: 10.1016/J.WATRES.2022.119182

Abstract: Consumption of hetamine and meth hetamine, two common illicit drugs, has been monitored by wastewater-based epidemiology (WBE) in many countries over the past decade. There is potential for the estimated amount of hetamine used to be skewed at locations where meth hetamine is also consumed, because hetamine is also excreted to wastewater following meth hetamine consumption. The present study aims to review the available data in the literature to identify an average ratio of hetamine/meth hetamine (AMP/METH) that is excreted to wastewater after meth hetamine consumption. This ratio could then be used to refine the estimation of hetamine consumption in catchments where there is both hetamine and meth hetamine use. Using data from more than 6000 wastewater s les from Australia where meth hetamine is the dominant illicit hetamine-type substance on the market, we were able to subtract the contribution of legal sources of hetamine contribution and obtain the median AMP/METH ratio in wastewater of 0.09. Using this value, the hetamine derived from meth hetamine consumption can be calculated and subtracted from the total hetamine mass loads in wastewater s les. Without considering the contribution of hetamine from meth hetamine use, selected European catchments with comparable consumption of hetamine and meth hetamine showed up to 83% overestimation of hetamine use. For catchments with AMP/METH ratio greater than 1.00, the impact of hetamine from meth hetamine would be negligible for catchments with AMP/METH ratio in the range of 0.04-0.19, it will be difficult to accurately estimate hetamine consumption.

Publication

Practical and theoretical aspects of mixture-of-experts modeling: An overview

Publisher: Wiley

Date: 13-02-2018

DOI: 10.1002/WIDM.1246

Publication

Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models

Publisher: Springer Science and Business Media LLC

Date: 06-08-2021

DOI: 10.1186/S40488-021-00125-0

Abstract: Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is univariate. Auxiliary lemmas are proved regarding the richness of the soft-max gating function class, and their relationships to the class of Gaussian gating functions.

Publication

On approximations via convolution-defined mixture models

Publisher: Informa UK Limited

Date: 30-10-2018

DOI: 10.1080/03610926.2018.1487069

Hien Nguyen

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Chunked-and-averaged estimators for vector parameters

Model independent feature attributions: Shapley values that uncover non-linear dependencies

Shapley Values for Feature Selection: The Good, the Bad, and the Axioms

A universal approximation theorem formixture-of-experts models

Approximate Bayesian Computation Via the Energy Statistic

Whole‐volume clustering of time series data from zebrafish brain calcium images via mixture modeling

Asymptotic Normality of the Maximum Pseudolikelihood Estimator for Fully Visible Boltzmann Machines

A Two-Sample Kolmogorov-Smirnov-Like Test for Big Data

A Block Minorization-Maximization Algorithm for Heteroscedastic Regression

The fully visible Boltzmann machine and the Senate of the 45th Australian Parliament in 2016

Modelling the Relationships between Train Commuters’ Access Modes and Traffic Safety

Spatial clustering of time series via mixture of autoregressions models and Markov random fields

Near universal consistency of the maximum pseudolikelihood estimator for discrete models

An introduction to Majorization‐Minimization algorithms for machine learning and statistical estimation

Approximation results regarding the multiple-output Gaussian gated mixture of linear experts model

Some theoretical results regarding the polygonal distribution

Finite sample inference for empirical Bayesian methods

New palladium(II) complex of SCN unsymmetric pincer-type ligand via oxidative addition

Online quantitative proteomics p -value calculator for permutation-based statistical testing of peptide ratios

Bayesian clustering of skewed and multimodal data using geometric skewed normal distributions

Progress on a conjecture regarding the triangular distribution

Statistical Evaluation of Labeled Comparative Profiling Proteomics Experiments Using Permutation Test

An Introduction to Approximate Bayesian Computation

Multisite generalizability of schizophrenia diagnosis classification based on functional brain connectivity

A Festschrift for Geoff McLachlan

Laplace mixture of linear experts

Laplace mixture autoregressive models

On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables

Author response: Functional connectivity subtypes associate robustly with ASD diagnosis

Sub‐Weibull distributions: Generalizing sub‐Gaussian and sub‐Exponential properties to heavier tailed distributions

Approximation of probability density functions via location-scale finite mixtures in Lebesgue spaces

A Universal Approximation Theorem for Gaussian-Gated Mixture of Experts Models

Randomized mixture models for probability density approximation and estimation

Subtypes of functional connectivity associate robustly with ASD diagnosis

k-means on Positive Definite Matrices, and an Application to Clustering in Radar Image Sequences

studentlife: Tidy Handling and Navigation of a Valuable Mobile-Health Dataset

Regularized estimation and feature selection in mixtures of Gaussian-gated experts models

Variable selection in statistical models using population-based incremental learning with applications to genome-wide association studies

Reproducible functional connectivity endophenotype confers high risk of ASD diagnosis in a subset of individuals

Mixture of time-dependent growth models with an application to blue swimmer crab length-frequency data

Positive Data Kernel Density Estimation via the LogKDE Package for R

Asymptotic inference for hidden process regression models

Asymptotic normality of the time‐domain generalized least squares estimator for linear regression models

Multisite generalizability of schizophrenia diagnosis classification based on functional brain connectivity

Model-based clustering and classification of functional data

Shapley Value Confidence Intervals for Attributing Variance Explained

Spatial false discovery rate control for magnetic resonance imaging studies

Maximum likelihood estimation of triangular and polygonal distributions

Pedestrian overpass use and its relationships with digital and social distractions, and overpass characteristics

Response functions

Improved estimation of size-transition matrices using tag-recapture data

False discovery rate control in magnetic resonance imaging studies via markov random fields

Linear mixed models with marginally symmetric nonparametric random effects

Mini-batch learning of exponential family finite mixture models

Maximum likelihood estimation of Gaussian mixture models without matrix operations

Mixtures of spatial spline regressions for clustering and classification

Expression of PTRF in PC-3 cells modulates cholesterol dynamics and the actin cytoskeleton impacting secretion pathways

A globally convergent algorithm for lasso-penalized mixture of linear regression models

The Fully-Visible Boltzmann Machine and the Senate of the 45th Australian Parliament in 2016

A block successive lower-bound maximization algorithm for the maximum pseudo-likelihood estimation of fully visible Boltzmann machines

Maximum Pseudolikelihood Estimation for Model-Based Clustering of Time Series Data

A robust permutation test for quantitative SILAC proteomics experiments

Approximation by finite mixtures of continuous density functions that vanish at infinity

Functional connectivity subtypes associate robustly with ASD diagnosis

Refining the estimation of amphetamine consumption by wastewater-based epidemiology

Practical and theoretical aspects of mixture-of-experts modeling: An overview

Approximations of conditional probability density functions in Lebesgue spaces via mixture of experts models

On approximations via convolution-defined mixture models

Related Organisations

La Trobe University

University Of Queensland

Related Funding Activities