ORCID Profile
0000-0002-8136-2294
Current Organisation
Macquarie University
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Statistics | Applied Statistics | Natural Resource Management | Environmental Monitoring | Biostatistics | Clinical Sciences not elsewhere classified |
Application Software Packages (excl. Computer Games) | Physical and Chemical Conditions of Water in Fresh, Ground and Surface Water Environments (excl. Urban and Industrial Use) | Expanding Knowledge in the Mathematical Sciences | Expanding Knowledge in the Medical and Health Sciences | Rural Water Evaluation (incl. Water Quality)
Publisher: Institute of Mathematical Statistics
Date: 12-2017
DOI: 10.1214/17-BA1081
Publisher: Springer Science and Business Media LLC
Date: 10-2021
Publisher: Springer Science and Business Media LLC
Date: 11-2021
DOI: 10.1038/S41592-021-01309-X
Abstract: Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometry based glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O -glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved ‘high-coverage’ and ‘high-accuracy’ glycoproteomics search solutions. This study concludes that erse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.
Publisher: Frontiers Media SA
Date: 17-12-2021
Abstract: High rates of bio ersity loss caused by human-induced changes in the environment require new methods for large scale fauna monitoring and data analysis. While ecoacoustic monitoring is increasingly being used and shows promise, analysis and interpretation of the big data produced remains a challenge. Computer-generated acoustic indices potentially provide a biologically meaningful summary of sound, however, temporal autocorrelation, difficulties in statistical analysis of multi-index data and lack of consistency or transferability in different terrestrial environments have hindered the application of those indices in different contexts. To address these issues we investigate the use of time-series motif discovery and random forest classification of multi-indices through two case studies. We use a semi-automated workflow combining time-series motif discovery and random forest classification of multi-index (acoustic complexity, temporal entropy, and events per second) data to categorize sounds in unfiltered recordings according to the main source of sound present (birds, insects, geophony). Our approach showed more than 70% accuracy in label assignment in both datasets. The categories assigned were broad, but we believe this is a great improvement on traditional single index analysis of environmental recordings as we can now give ecological meaning to recordings in a semi-automated way that does not require expert knowledge and manual validation is only necessary for a small subset of the data. Furthermore, temporal autocorrelation, which is largely ignored by researchers, has been effectively eliminated through the time-series motif discovery technique applied here for the first time to ecoacoustic data. We expect that our approach will greatly assist researchers in the future as it will allow large datasets to be rapidly processed and labeled, enabling the screening of recordings for undesired sounds, such as wind, or target biophony (insects and birds) for bio ersity monitoring or bioacoustics research.
Publisher: Zenodo
Date: 2022
Publisher: Wiley
Date: 23-02-2016
DOI: 10.1002/SIM.6909
Abstract: Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the q-generalized family-wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type-I error rate, we define a type-II r-generalized family-wise error rate, which is directly related to the r-power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the s le size for single-step and step-wise procedures. These are implemented in our R package rPowerS leSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute s le sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd.
Publisher: Springer Science and Business Media LLC
Date: 04-02-2014
Publisher: Walter de Gruyter GmbH
Date: 2015
Abstract: Selection of estimators is an essential task in modeling. A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and assessed using another function (the assessment function). A classical case is that both functions estimate an information risk (specifically cross-entropy) this corresponds to using maximum likelihood estimators and assessing them by Akaike information criterion (AIC). In more general cases, the assessment risk can be estimated by leave-one-out cross-validation. Since leave-one-out cross-validation is computationally very demanding, we propose in this paper a universal approximate cross-validation criterion under regularity conditions (UACVR). This criterion can be adapted to different types of estimators, including penalized likelihood and maximum a posteriori estimators, and also to different assessment risk functions, including information risk functions and continuous rank probability score (CRPS). UACVR reduces to Takeuchi information criterion (TIC) when cross-entropy is the risk for both estimation and assessment. We provide the asymptotic distributions of UACVR and of a difference of UACVR values for two estimators. We validate UACVR using simulations and provide an illustration on real data both in the psychometric context where estimators of the distributions of ordered categorical data derived from threshold models and models based on continuous approximations are compared.
Publisher: Informa UK Limited
Date: 28-09-2017
Publisher: Wiley
Date: 26-04-2018
DOI: 10.1002/IJC.31536
Publisher: Springer Science and Business Media LLC
Date: 16-04-2019
Publisher: Wiley
Date: 2008
DOI: 10.1002/SIM.3161
Abstract: In a meta-analysis combining survival data from different clinical trials, an important issue is the possible heterogeneity between trials. Such intertrial variation can not only be explained by heterogeneity of treatment effects across trials but also by heterogeneity of their baseline risk. In addition, one might examine the relationship between magnitude of the treatment effect and the underlying risk of the patients in the different trials. Such a scenario can be accounted for by using additive random effects in the Cox model, with a random trial effect and a random treatment-by-trial interaction. We propose to use this kind of model with a general correlation structure for the random effects and to estimate parameters and hazard function using a semi-parametric penalized marginal likelihood method (maximum penalized likelihood estimators). This approach gives smoothed estimates of the hazard function, which represents incidence in epidemiology. The idea for the approach in this paper comes from the study of heterogeneity in a large meta-analysis of randomized trials in patients with head and neck cancers (meta-analysis of chemotherapy in head and neck cancers) and the effect of adding chemotherapy to locoregional treatment. The simulation study and the application demonstrate that the proposed approach yields satisfactory results and they illustrate the need to use a flexible variance-covariance structure for the random effects.
Publisher: Public Library of Science (PLoS)
Date: 29-06-2023
DOI: 10.1371/JOURNAL.PONE.0287705
Abstract: Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Overs ling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic ex les by computing a linear combination of selected existing data points, using compositional data operations. The performance of the SMOTE-CD is tested with three different regressors (Gradient Boosting tree, Neural Networks, Dirichlet regressor) applied to two real datasets and to synthetic generated data, and the performance is evaluated using accuracy, cross-entropy, F1-score, R2 score and RMSE. The results show improvements across all metrics, but the impact of overs ling on performance varies depending on the model and the data. In some cases, overs ling may lead to a decrease in performance for the majority class. However, for the real data, the best performance across all models is achieved when overs ling is used. Notably, the F1-score is consistently increased with overs ling. Unlike the original technique, the performance is not improved when combining overs ling of the minority classes and unders ling of the majority class. The Python package smote-cd implements the method and is available online.
Publisher: Wiley
Date: 31-07-2012
Abstract: Attributable risk has become an important concept in clinical epidemiology. In this paper, we suggest to estimate the attributable risk of nosocomial infections using a multistate approach. Recently, a multistate model (called progressive disability model in the literature) has been developed in order to take into consideration both the time-dependency of the risk factor (e.g., nosocomial infections) and the presence of competing risks (e.g., death and discharge) at each time point. However, this approach does not take into account the possible heterogeneity of the study population. In this paper, we investigate an extension of this model and suggest an adjusted disability multistate model including covariates in each transition. This new multistate model has led us to define the concepts of overall and profiled attributable risk. We use a classical semiparametric approach to estimate the model and the new attributable risk. A simulation study is investigated and we show, in particular, that neglecting the presence of covariates when estimating the model can lead to an important bias. The methodology developed in this paper is applied to data on ventilator-associated pneumonia in 12 French intensive care units.
Publisher: Wiley
Date: 27-12-2020
DOI: 10.1002/SIM.8855
Publisher: Springer Science and Business Media LLC
Date: 12-2012
Publisher: Wiley
Date: 11-05-2012
DOI: 10.1111/J.1541-0420.2012.01753.X
Abstract: Prognostic estimators for a clinical event may use repeated measurements of markers in addition to fixed covariates. These measurements can be linked to the clinical event by joint models that involve latent features. When the objective is to choose between different prognosis estimators based on joint models, the conventional Akaike information criterion is not well adapted and decision should be based on predictive accuracy. We define an adapted risk function called expected prognostic cross-entropy. We define another risk function for the case of right-censored observations, the expected prognostic observed cross-entropy (EPOCE). These risks can be estimated by leave-one-out cross-validation, for which we give approximate formulas and asymptotic distributions. The approximated cross-validated estimator CVPOL (a) of EPOCE is studied in simulation and applied to the comparison of several joint latent class models for prognosis of recurrence of prostate cancer using prostate-specific antigen measurements.
Publisher: Wiley
Date: 10-02-2023
DOI: 10.1111/JBI.14570
Abstract: Optimum shifts in species–environment relationships are intensively studied in a wide range of ecological topics, including climate change and species invasion. Numerous statistical methods are used to study optimum shifts, but, to our knowledge, none explicitly estimate it. We extended an existing model to explicitly estimate optimum shifts for multiple species having symmetrical response curves. We called this new Bayesian hierarchical model the Explicit Hierarchical Model of Optimum Shifts (EHMOS). All locations. All taxa. In a simulation study, we compared the accuracy of EHMOS to a mean comparison method and a Bayesian generalized linear mixed model (GLMM). Specifically, we tested if the accuracy of the methods was sensitive to (1) s ling design, (2) species optimum position and (3) species ecological specialization. In addition, we compared the three methods using a real dataset of investigated optimum shifts in 24 Orthopteran species between two time periods along an elevation gradient. Of all the simulated scenarios, EHMOS was the most accurate method. GLMM was the most sensitive method to species optimum position, providing unreliable estimates in the presence of marginal species, that is, species with an optimum close to a s ling boundary. The mean comparison method was also sensitive to species optimum position and ecological specialization, especially in an unbalanced s ling design, with high negative bias and low interval coverage compared to EHMOS. The case study results obtained with EHMOS were consistent with what is expected considering ongoing climate change, with mostly upward shifts, which further improved confidence in the accuracy of the EHMOS method. Explicit Hierarchical Model of Optimum Shifts could be used for a wide range of topics and extended to produce new insights, especially in climate change studies. Explicit estimation of optimum shifts notably allows investigation of ecological assumptions that could explain interspecific variability of these shifts.
Publisher: Wiley
Date: 21-10-2015
DOI: 10.1111/ACEL.12406
Publisher: Coastal Education and Research Foundation
Date: 05-2018
DOI: 10.2112/SI85-161.1
Publisher: Elsevier BV
Date: 03-2021
Publisher: Springer Science and Business Media LLC
Date: 12-2004
DOI: 10.1007/S10985-004-4772-Z
Abstract: A criterion for choosing an estimator in a family of semi-parametric estimators from incomplete data is proposed. This criterion is the expected observed log-likelihood (ELL). Adapted versions of this criterion in case of censored data and in presence of explanatory variables are exhibited. We show that likelihood cross-validation (LCV) is an estimator of ELL and we exhibit three bootstrap estimators. A simulation study considering both families of kernel and penalized likelihood estimators of the hazard function (indexed on a smoothing parameter) demonstrates good results of LCV and a bootstrap estimator called ELL(bboot). We apply the ELL(bboot) criterion to compare the kernel and penalized likelihood estimators to estimate the risk of developing dementia for women using data from a large cohort study.
Publisher: Informa UK Limited
Date: 04-03-2014
DOI: 10.1080/10543406.2013.860156
Abstract: The use of two or more primary correlated endpoints is becoming increasingly common. A mandatory approach when analyzing data from such clinical trials is to control the family-wise error rate (FWER). In this context, we provide formulas for computation of s le size and for data analysis. Two approaches are discussed: an in idual method based on a union-intersection procedure and a global procedure, based on a multivariate model that can take into account adjustment variables. These methods are illustrated with simulation studies and applications. An R package known as rPowerS leSize is also available.
Publisher: Wiley
Date: 27-11-2007
Publisher: BMJ
Date: 12-2020
DOI: 10.1136/BMJOPEN-2020-041417
Abstract: There is a paucity of data that can be used to guide the management of critically ill patients with COVID-19. In response, a research and data-sharing collaborative—The COVID-19 Critical Care Consortium—has been assembled to harness the cumulative experience of intensive care units (ICUs) worldwide. The resulting observational study provides a platform to rapidly disseminate detailed data and insights crucial to improving outcomes. This is an international, multicentre, observational study of patients with confirmed or suspected SARS-CoV-2 infection admitted to ICUs. This is an evolving, open-ended study that commenced on 1 January 2020 and currently includes sites in over 48 countries. The study enrols patients at the time of ICU admission and follows them to the time of death, hospital discharge or 28 days post-ICU admission, whichever occurs last. Key data, collected via an electronic case report form devised in collaboration with the International Severe Acute Respiratory and Emerging Infection Consortium/Short Period Incidence Study of Severe Acute Respiratory Illness networks, include: patient demographic data and risk factors, clinical features, severity of illness and respiratory failure, need for non-invasive and/or mechanical ventilation and/or extracorporeal membrane oxygenation and associated complications, as well as data on adjunctive therapies. Local principal investigators will ensure that the study adheres to all relevant national regulations, and that the necessary approvals are in place before a site may contribute data. In jurisdictions where a waiver of consent is deemed insufficient, prospective, representative or retrospective consent will be obtained, as appropriate. A web-based dashboard has been developed to provide relevant data and descriptive statistics to international collaborators in real-time. It is anticipated that, following study completion, all de-identified data will be made open access. ACTRN12620000421932 ( anzctr.org.au/ACTRN12620000421932.aspx ).
Publisher: BMJ
Date: 21-03-2018
Abstract: Epidemiological studies provide evidence that environmental exposures may affect health through complex mixtures. Formal investigation of the effect of exposure mixtures is usually achieved by modelling interactions, which relies on strong assumptions relating to the identity and the number of the exposures involved in such interactions, and on the order and parametric form of these interactions. These hypotheses become difficult to formulate and justify in an exposome context, where influential exposures are numerous and heterogeneous. To capture both the complexity of the exposome and its possibly pleiotropic effects, models handling multivariate predictors and responses, such as partial least squares (PLS) algorithms, can prove useful. As an illustrative ex le, we applied PLS models to data from a study investigating the inflammatory response (blood concentration of 13 immune markers) to the exposure to four disinfection by-products (one brominated and three chlorinated compounds), while swimming in a pool. To accommodate the multiple observations per participant (n=60 before and after the swim), we adopted a multilevel extension of PLS algorithms, including sparse PLS models shrinking loadings coefficients of unimportant predictors (exposures) and/or responses (protein levels). Despite the strong correlation among co-occurring exposures, our approach identified a subset of exposures (n=3/4) affecting the exhaled levels of 8 (out of 13) immune markers. PLS algorithms can easily scale to high-dimensional exposures and responses, and prove useful for exposome research to identify sparse sets of exposures jointly affecting a set of (selected) biological markers. Our descriptive work may guide these extensions for higher dimensional data.
Publisher: Foundation for Open Access Statistic
Date: 2016
Publisher: Institute of Mathematical Statistics
Date: 2022
DOI: 10.1214/22-SS138
Publisher: Elsevier BV
Date: 08-2022
Publisher: Wiley
Date: 03-2008
DOI: 10.1111/J.1460-9568.2008.06123.X
Abstract: A central question in chemical senses is the way that odorant molecules are represented in the brain. To date, many studies, when taken together, suggest that structural features of the molecules are represented through a spatio-temporal pattern of activation in the olfactory bulb (OB), in both glomerular and mitral cell layers. Mitral/tufted cells interact with a large population of inhibitory interneurons resulting in a temporal patterning of bulbar local field potential (LFP) activity. We investigated the possibility that molecular features could determine the temporal pattern of LFP oscillatory activity in the OB. For this purpose, we recorded the LFPs in the OB of urethane-anesthetized, freely breathing rats in response to series of aliphatic odorants varying subtly in carbon-chain length or functional group. In concordance with our previous reports, we found that odors evoked oscillatory activity in the LFP signal in both the beta and gamma frequency bands. Analysis of LFP oscillations revealed that, although molecular features have almost no influence on the intrinsic characteristics of LFP oscillations, they influence the temporal patterning of bulbar oscillations. Alcohol family odors rarely evoke gamma oscillations, whereas ester family odors rather induce oscillatory patterns showing beta/gamma alternation. Moreover, for molecules with the same functional group, the probability of gamma occurrence is correlated to the vapor pressure of the odor. The significance of the relation between odorant features and oscillatory regimes along with their functional relevance are discussed.
Publisher: Springer Science and Business Media LLC
Date: 14-08-2007
Publisher: Elsevier BV
Date: 09-2011
Publisher: Wiley
Date: 03-2003
Abstract: Ishiguro, Sakamoto, and Kitagawa (1997, Annals of the Institute of Statistical Mathematics 49, 411-434) proposed EIC as an extension of Akaike criterion (AIC) the idea leading to EIC is to correct the bias of the log-likelihood, considered as an estimator of the Kullback-Leibler information, using bootstrap. We develop this criterion for its use in multivariate semiparametric situations, and argue that it can be used for choosing among parametric and semiparametric estimators. A simulation study based on aregression model shows that EIC is better than its competitors although likelihood cross-validation performs nearly as well except for small s le size. Its use is illustrated by estimating the mean evolution of viral RNA levels in a group of infants infected by HIV.
Publisher: Springer Science and Business Media LLC
Date: 11-01-2012
Publisher: Wiley
Date: 2001
DOI: 10.1002/SIM.916
Abstract: We propose a method and a program to determine a significance level for a series of codings of an explanatory variable in logistic regression. Dichotomous and Box-Cox transformations are considered. Three methods of correcting the significance level are studied: the Bonferroni method Efron's method, which uses the correlation between successive tests, and the exact calculation by numerical integration using all correlations. A simulation study has led to a strategy for the choice and number of the different codings of the variable. This method is illustrated using the data of a study of the relation between cholesterol and dementia.
Publisher: Elsevier BV
Date: 2005
Publisher: MDPI AG
Date: 15-05-2021
DOI: 10.3390/RS13101933
Abstract: Data about storm impacts are essential for the disaster risk reduction process, but unlike data about storm characteristics, they are not routinely collected. In this paper, we demonstrate the high potential of convolutional neural networks to automatically constitute storm impact database using timestacks images provided by coastal video monitoring stations. Several convolutional neural network architectures and methods to deal with class imbalance were tested on two sites (Biarritz and Zarautz) to find the best practices for this classification task. This study shows that convolutional neural networks are well adapted for the classification of timestacks images into storm impact regimes. Overall, the most complex and deepest architectures yield better results. Indeed, the best performances are obtained with the VGG16 architecture for both sites with F-scores of 0.866 for Biarritz and 0.858 for Zarautz. For the class imbalance problem, the method of overs ling shows best classification accuracy with F-scores on average 30% higher than the ones obtained with cost sensitive learning. The transferability of the learning method between sites is also investigated and shows conclusive results. This study highlights the high potential of convolutional neural networks to enhance the value of coastal video monitoring data that are routinely recorded on many coastal sites. Furthermore, it shows that this type of deep neural network can significantly contribute to the setting up of risk databases necessary for the determination of storm risk indicators and, more broadly, for the optimization of risk-mitigation measures.
Publisher: Apollo - University of Cambridge Repository
Date: 2018
DOI: 10.17863/CAM.22138
Publisher: Cold Spring Harbor Laboratory
Date: 15-05-2020
DOI: 10.1101/2020.05.15.097774
Abstract: High-dimensional datasets, where the number of variables ‘ p ’ is much larger compared to the number of s les ‘ n ’, are ubiquitous and often render standard classification and regression techniques unreliable due to overfitting. An important research problem is feature selection — ranking of candidate variables based on their relevance to the outcome variable and retaining those that satisfy a chosen criterion. In this article, we propose a computationally efficient variable selection method based on principal component analysis. The method is very simple, accessible, and suitable for the analysis of high-dimensional datasets. It allows to correct for population structure in genome-wide association studies (GWAS) which otherwise would induce spurious associations and is less likely to overfit. We expect our method to accurately identify important features but at the same time reduce the False Discovery Rate (FDR) (the expected proportion of erroneously rejected null hypotheses) through accounting for the correlation between variables and through de-noising data in the training phase, which also make it robust to outliers in the training data. Being almost as fast as univariate filters, our method allows for valid statistical inference. The ability to make such inferences sets this method apart from most of the current multivariate statistical tools designed for today’s high-dimensional data. We demonstrate the superior performance of our method through extensive simulations. A semi-real gene-expression dataset, a challenging childhood acute lymphoblastic leukemia (CALL) gene expression study, and a GWAS that attempts to identify single-nucleotide polymorphisms (SNPs) associated with the rice grain length further demonstrate the usefulness of our method in genomic applications. An integral part of modern statistical research is feature selection, which has claimed various scientific discoveries, especially in the emerging genomics applications such as gene expression and proteomics studies, where data has thousands or tens of thousands of features but a limited number of s les. However, in practice, due to unavailability of suitable multivariate methods, researchers often resort to univariate filters when it comes to deal with a large number of variables. These univariate filters do not take into account the dependencies between variables because they independently assess variables one-by-one. This leads to loss of information, loss of statistical power (the probability of correctly rejecting the null hypothesis) and potentially biased estimates. In our paper, we propose a new variable selection method. Being computationally efficient, our method allows for valid inference. The ability to make such inferences sets this method apart from most of the current multivariate statistical tools designed for today’s high-dimensional data.
Publisher: American Chemical Society (ACS)
Date: 28-08-2020
Publisher: Springer Science and Business Media LLC
Date: 15-06-2012
Publisher: Wiley
Date: 10-05-2011
Publisher: Public Library of Science (PLoS)
Date: 30-06-2023
DOI: 10.1371/JOURNAL.PONE.0287640
Abstract: Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables. We analysed high-frequency water-quality data from in-situ sensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto-correlation was modelled with an auto-regressive–moving-average (ARIMA) model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites (99%). Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water-quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost-effective water-quality variables to monitor when the goals are to gain a spatial and temporal in-depth understanding of nitrate dynamics and adapt management plans accordingly.
Publisher: Walter de Gruyter GmbH
Date: 06-01-2021
Abstract: Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.
Publisher: Informa UK Limited
Date: 25-05-2010
Publisher: Elsevier BV
Date: 09-2015
Publisher: Springer Science and Business Media LLC
Date: 04-03-2019
Publisher: MDPI AG
Date: 08-09-2020
DOI: 10.3390/RS12182908
Abstract: Commonly, when studies deal with the effects of climate change on bio ersity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the implementation of bio ersity response strategies. We studied the spatio-temporal variability of activity time and persistence index, calculated from operative temperatures measured at three sites over three years, for a mountain endemic species. Temperatures were recorded with biomimetic loggers, an original remote sensing technology, which has the same advantages as these tools but is suitable for recording biological organisms data. Among the 42 tests conducted, 71% were significant for spatial variability and 28% for temporal variability. The differences in daily activity times and in persistence indices demonstrated the effects of the micro-habitat, habitat, slope, altitude, hydrography, and year. These observations have highlighted the great variability existence in the environmental temperatures experienced by lizard populations. Thus, our study underlines the importance to implement multi-year and multi-site studies to quantify the variability and produce more representative results. These studies can be facilitated by the use of biomimetic loggers, for which a user guide is provided in the last part of this paper.
Publisher: Elsevier BV
Date: 11-2020
Publisher: Springer Science and Business Media LLC
Date: 17-07-2018
DOI: 10.1038/S41598-018-29041-1
Abstract: Chronic inflammation may be involved in cancer development and progression. Using 28 inflammatory-related proteins collected from prospective blood s les from two case-control studies nested in the Italian component of the European Prospective Investigation into Cancer and nutrition (n = 261) and in the Northern Sweden Health and Disease Study (n = 402), we tested the hypothesis that an inflammatory score is associated with breast cancer (BC) and Β-cell Non-Hodgkin Lymphoma (B-cell NHL, including 68 multiple myeloma cases) onset. We modelled the relationship between this inflammatory score and the two cancers studied: (BC and B-cell NHL) using generalised linear models, and assessed, through adjustments the role of behaviours and lifestyle factors. Analyses were performed by cancer types pooling both populations, and stratified by cohorts, and time to diagnosis. Our results suggested a lower inflammatory score in B-cell NHL cases (β = −1.28, p = 0.012), and, to lesser, extent with BC (β = −0.96, p = 0.33) compared to controls, mainly driven by cancer cases diagnosed less than 6 years after enrolment. These associations were not affected by subsequent adjustments for potential intermediate confounders, notably behaviours. Sensitivity analyses indicated that our findings were not affected by the way the inflammatory score was calculated. These observations call for further studies involving larger populations, larger variety of cancer types and repeated measures of larger panel of inflammatory markers.
Publisher: Wiley
Date: 11-06-2018
DOI: 10.1002/SIM.7821
Abstract: Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at att-sutton/sgspls.
Publisher: Springer Science and Business Media LLC
Date: 17-02-2012
Publisher: Springer Science and Business Media LLC
Date: 08-06-2013
Publisher: Cold Spring Harbor Laboratory
Date: 15-03-2021
DOI: 10.1101/2021.03.14.435332
Abstract: Glycoproteome profiling (glycoproteomics) is a powerful yet analytically challenging research tool. The complex tandem mass spectra generated from glycopeptide mixtures require sophisticated analysis pipelines for structural determination. Diverse software aiding the process have appeared, but their relative performance remains untested. Conducted through the HUPO Human Proteome Project – Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates the performance of informatics solutions for system-wide glycopeptide analysis. Mass spectrometry-based glycoproteomics datasets from human serum were shared with all teams. The relative team performance for N - and O -glycopeptide data analysis was comprehensively established and validated through orthogonal performance tests. Excitingly, several high-performance glycoproteomics informatics solutions were identified. While the study illustrated that significant informatics challenges remain, as indicated by a high discordance between annotated glycopeptides, lists of high-confidence (consensus) glycopeptides were compiled from the standardised team reports. Deep analysis of the performance data revealed key performance-associated search variables and led to recommendations for improved “high coverage” and “high accuracy” glycoproteomics search strategies. This study concludes that erse software for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies, and specifies key variables that may guide future software developments and assist informatics decision-making in glycoproteomics.
Publisher: Public Library of Science (PLoS)
Date: 02-12-2021
DOI: 10.1371/JOURNAL.PONE.0260717
Abstract: Eye-tracking research has been widely used in radiology applications. Prior studies exclusively analysed either temporal or spatial eye-tracking features, both of which alone do not completely characterise the spatiotemporal dynamics of radiologists’ gaze features. Our research aims to quantify human visual search dynamics in both domains during brain stimuli screening to explore the relationship between reader characteristics and stimuli complexity. The methodology can be used to discover strategies to aid trainee radiologists in identifying pathology, and to select regions of interest for machine vision applications. The study was performed using eye-tracking data 5 seconds in duration from 57 readers (15 Brain-experts, 11 Other-experts, 5 Registrars and 26 Naïves) for 40 neuroradiological images as stimuli (i.e., 20 normal and 20 pathological brain MRIs). The visual scanning patterns were analysed by calculating the fractal dimension (FD) and Hurst exponent (HE) using re-scaled range (R/S) and detrended fluctuation analysis (DFA) methods. The FD was used to measure the spatial geometrical complexity of the gaze patterns, and the HE analysis was used to measure participants’ focusing skill. The focusing skill is referred to persistence/anti-persistence of the participants’ gaze on the stimulus over time. Pathological and normal stimuli were analysed separately both at the “First Second” and full “Five Seconds” viewing duration. All experts were more focused and a had higher visual search complexity compared to Registrars and Naïves. This was seen in both the pathological and normal stimuli in the first and five second analyses. The Brain-experts subgroup was shown to achieve better focusing skill than Other-experts due to their domain specific expertise. Indeed, the FDs found when viewing pathological stimuli were higher than those in normal ones. Viewing normal stimuli resulted in an increase of FD found in five second data, unlike pathological stimuli, which did not change. In contrast to the FDs, the scanpath HEs of pathological and normal stimuli were similar. However, participants’ gaze was more focused for “Five Seconds” than “First Second” data. The HE analysis of the scanpaths belonging to all experts showed that they have greater focus than Registrars and Naïves. This may be related to their higher visual search complexity than non-experts due to their training and expertise.
Publisher: Springer Science and Business Media LLC
Date: 07-01-2022
DOI: 10.1186/S12874-021-01491-8
Abstract: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap s ling strategy is provided to explore the stability of the penalised methods. Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
Publisher: Oxford University Press (OUP)
Date: 05-07-2023
Abstract: Cross-phenotype association using gene-set analysis can help to detect pleiotropic genes and inform about common mechanisms between diseases. Although there are an increasing number of statistical methods for exploring pleiotropy, there is a lack of proper pipelines to apply gene-set analysis in this context and using genome-scale data in a reasonable running time. We designed a user-friendly pipeline to perform cross-phenotype gene-set analysis between two traits using GCPBayes, a method developed by our team. All analyses could be performed automatically by calling for different scripts in a simple way (using a Shiny app, Bash or R script). A Shiny application was also developed to create different plots to visualize outputs from GCPBayes. Finally, a comprehensive and step-by-step tutorial on how to use the pipeline is provided in our group’s GitHub page. We illustrated the application on publicly available GWAS (genome-wide association studies) summary statistics data to identify breast cancer and ovarian cancer susceptibility genes. We have shown that the GCPBayes pipeline could extract pleiotropic genes previously mentioned in the literature, while it also provided new pleiotropic genes and regions that are worthwhile for further investigation. We have also provided some recommendations about parameter selection for decreasing computational time of GCPBayes on genome-scale data.
Publisher: Informa UK Limited
Date: 09-12-2018
Publisher: Institute of Mathematical Statistics
Date: 2019
DOI: 10.1214/19-SS125
Publisher: Foundation for Open Access Statistic
Date: 2017
Publisher: Elsevier BV
Date: 12-2023
Publisher: Springer Science and Business Media LLC
Date: 24-02-2021
DOI: 10.1186/S12859-021-03968-1
Abstract: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an ex le of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.
Publisher: MDPI AG
Date: 07-11-2021
DOI: 10.3390/RS13214470
Abstract: Coral reefs are an essential source of marine bio ersity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has become a necessary step for monitoring their health and evolution. This mapping can be achieved remotely thanks to satellite imagery coupled with machine-learning algorithms. In this paper, we review the different satellites used in recent literature, as well as the most common and efficient machine-learning methods. To account for the recent explosion of published research on coral reel mapping, we especially focus on the papers published between 2018 and 2020. Our review study indicates that object-based methods provide more accurate results than pixel-based ones, and that the most accurate methods are Support Vector Machine and Random Forest. We emphasize that the satellites with the highest spatial resolution provide the best images for benthic habitat mapping. We also highlight that preprocessing steps (water column correction, sunglint removal, etc.) and additional inputs (bathymetry data, aerial photographs, etc.) can significantly improve the mapping accuracy.
Publisher: MDPI AG
Date: 04-12-2021
Abstract: In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data. To mimic sporadically missing (i) single observations and (ii) periods of contiguous observations, we randomly removed (i) point data and (ii) day- and week-long sequences of data from a two-year time series of nitrate concentration data collected from Arikaree River, USA, where synoptically collected water temperature, turbidity, conductance, elevation, and dissolved oxygen data were available. In 72% of cases with missing point data, predicted values were within the sensor precision interval of the original value, although predictive ability declined when sequences of missing data occurred. Precision also depended on the availability of other water quality covariates. When covariates were available, even a sudden, event-based peak in nitrate concentration was reconstructed well. By providing a promising method for accurate prediction of missing data, the utility and confidence in summary statistics and statistical trends will increase, thereby assisting the effective monitoring and management of fresh waters and other at-risk ecosystems.
Publisher: Elsevier BV
Date: 04-2022
Publisher: Springer Science and Business Media LLC
Date: 03-08-2007
Publisher: Springer Science and Business Media LLC
Date: 27-11-2022
DOI: 10.1007/S00234-021-02845-1
Abstract: To systematically review the literature regarding the application of machine learning (ML) of magnetic resonance imaging (MRI) radiomics in common sellar tumors. To identify future directions for application of ML in sellar tumor MRI. PubMed, Medline, Embase, Google Scholar, Scopus, ArxIV, and bioRxiv were searched to identify relevant studies published between 2010 and September 2021. Studies were included if they specifically involved ML of MRI radiomics in the analysis of sellar masses. Risk of bias assessment was performed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) Tool. Fifty-eight articles were identified for review. All papers utilized retrospective data, and a quantitative systematic review was performed for thirty-one studies utilizing a public dataset which compared pituitary adenomas, meningiomas, and gliomas. One of the analyzed architectures yielded the highest classification accuracy of 0.996. The remaining twenty-seven articles were qualitatively reviewed and showed promising findings in predicting specific tumor characteristics such as tumor consistency, Ki-67 proliferative index, and post-surgical recurrence. This review highlights the potential clinical application of ML using MRI radiomic data of the sellar region in diagnosis and predicting treatment outcomes. We describe future directions for practical application in the clinical care of patients with pituitary neuroendocrine and other sellar tumors.
Publisher: Public Library of Science (PLoS)
Date: 08-08-2013
Publisher: Zenodo
Date: 2022
Publisher: Wiley
Date: 25-03-2020
DOI: 10.1111/GCB.15059
Publisher: Informa UK Limited
Date: 19-05-2008
Publisher: Informa UK Limited
Date: 05-2009
Publisher: Public Library of Science (PLoS)
Date: 03-01-2017
Publisher: Springer Science and Business Media LLC
Date: 26-09-2023
Publisher: Informa UK Limited
Date: 11-2011
Publisher: Foundation for Open Access Statistic
Date: 2017
Publisher: Wiley
Date: 25-08-2020
DOI: 10.1002/SIM.8720
Publisher: Springer Science and Business Media LLC
Date: 09-06-2021
DOI: 10.1186/S13054-021-03518-4
Abstract: Heterogeneous respiratory system static compliance ( C RS ) values and levels of hypoxemia in patients with novel coronavirus disease (COVID-19) requiring mechanical ventilation have been reported in previous small-case series or studies conducted at a national level. We designed a retrospective observational cohort study with rapid data gathering from the international COVID-19 Critical Care Consortium study to comprehensively describe C RS —calculated as: tidal volume/[airway plateau pressure-positive end-expiratory pressure (PEEP)]—and its association with ventilatory management and outcomes of COVID-19 patients on mechanical ventilation (MV), admitted to intensive care units (ICU) worldwide. We studied 745 patients from 22 countries, who required admission to the ICU and MV from January 14 to December 31, 2020, and presented at least one value of C RS within the first seven days of MV. Median (IQR) age was 62 (52–71), patients were predominantly males (68%) and from Europe/North and South America (88%). C RS , within 48 h from endotracheal intubation, was available in 649 patients and was neither associated with the duration from onset of symptoms to commencement of MV ( p = 0.417) nor with PaO 2 /FiO 2 ( p = 0.100). Females presented lower C RS than males (95% CI of C RS difference between females-males: − 11.8 to − 7.4 mL/cmH 2 O p 0.001), and although females presented higher body mass index (BMI), association of BMI with C RS was marginal ( p = 0.139). Ventilatory management varied across C RS range, resulting in a significant association between C RS and driving pressure (estimated decrease − 0.31 cmH 2 O/L per mL/cmH 2 0 of C RS , 95% CI − 0.48 to − 0.14, p 0.001). Overall, 28-day ICU mortality, accounting for the competing risk of being discharged within the period, was 35.6% (SE 1.7). Cox proportional hazard analysis demonstrated that C RS (+ 10 mL/cm H 2 O) was only associated with being discharge from the ICU within 28 days (HR 1.14, 95% CI 1.02–1.28, p = 0.018). This multicentre report provides a comprehensive account of C RS in COVID-19 patients on MV. C RS measured within 48 h from commencement of MV has marginal predictive value for 28-day mortality, but was associated with being discharged from ICU within the same period. Trial documentation: Available at tudy . Trial registration : ACTRN12620000421932.
Publisher: Foundation for Open Access Statistic
Date: 2012
Publisher: Oxford University Press (OUP)
Date: 22-09-2023
DOI: 10.1093/HMG/DDAD159
Publisher: International Press of Boston
Date: 2016
Publisher: Oxford University Press (OUP)
Date: 10-09-2015
DOI: 10.1093/BIOINFORMATICS/BTV535
Abstract: Motivation: The association between two blocks of ‘omics’ data brings challenging issues in computational biology due to their size and complexity. Here, we focus on a class of multivariate statistical methods called partial least square (PLS). Sparse version of PLS (sPLS) operates integration of two datasets while simultaneously selecting the contributing variables. However, these methods do not take into account the important structural or group effects due to the relationship between markers among biological pathways. Hence, considering the predefined groups of markers (e.g. genesets), this could improve the relevance and the efficacy of the PLS approach. Results: We propose two PLS extensions called group PLS (gPLS) and sparse gPLS (sgPLS). Our algorithm enables to study the relationship between two different types of omics data (e.g. SNP and gene expression) or between an omics dataset and multivariate phenotypes (e.g. cytokine secretion). We demonstrate the good performance of gPLS and sgPLS compared with the sPLS in the context of grouped data. Then, these methods are compared through an HIV therapeutic vaccine trial. Our approaches provide parsimonious models to reveal the relationship between gene abundance and the immunological response to the vaccine. Availability and implementation: The approach is implemented in a comprehensive R package called sgPLS available on the CRAN. Contact: b.liquet@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 24-11-2008
DOI: 10.1111/J.1541-0420.2008.01132_1.X
Abstract: Cook, Gold, and Li (2007, Biometrics 63, 540-549) extended the Kulldorff (1997, Communications in Statistics 26, 1481-1496) scan statistic for spatial cluster detection to survival-type observations. Their approach was based on the score statistic and they proposed a permutation distribution for the maximum of score tests. The score statistic makes it possible to apply the scan statistic idea to models including explanatory variables. However, we show that the permutation distribution requires strong assumptions of independence between potential cluster and both censoring and explanatory variables. In contrast, we present an approach using the asymptotic distribution of the maximum of score statistics in a manner not requiring these assumptions.
Start Date: 11-2019
End Date: 05-2024
Amount: $484,189.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2022
End Date: 05-2025
Amount: $405,000.00
Funder: Australian Research Council
View Funded Activity