ARDC Research Link Australia

Publication

Bayesian variable selection regression of multivariate responses for group data

Publisher: Institute of Mathematical Statistics

Date: 12-2017

DOI: 10.1214/17-BA1081

Publication

Robust estimation procedure for autoregressive models with heterogeneity

Publisher: Springer Science and Business Media LLC

Date: 10-2021

DOI: 10.1007/S10666-020-09730-W

Publication

Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis

Publisher: Springer Science and Business Media LLC

Date: 11-2021

DOI: 10.1038/S41592-021-01309-X

Abstract: Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometry based glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O -glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved ‘high-coverage’ and ‘high-accuracy’ glycoproteomics search solutions. This study concludes that erse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.

Publication

Multi-Index Ecoacoustics Analysis for Terrestrial Soundscapes: A New Semi-Automated Approach Using Time-Series Motif Discovery and Random Forest Classification

Publisher: Frontiers Media SA

Date: 17-12-2021

DOI: 10.3389/FEVO.2021.738537

Abstract: High rates of bio ersity loss caused by human-induced changes in the environment require new methods for large scale fauna monitoring and data analysis. While ecoacoustic monitoring is increasingly being used and shows promise, analysis and interpretation of the big data produced remains a challenge. Computer-generated acoustic indices potentially provide a biologically meaningful summary of sound, however, temporal autocorrelation, difficulties in statistical analysis of multi-index data and lack of consistency or transferability in different terrestrial environments have hindered the application of those indices in different contexts. To address these issues we investigate the use of time-series motif discovery and random forest classification of multi-indices through two case studies. We use a semi-automated workflow combining time-series motif discovery and random forest classification of multi-index (acoustic complexity, temporal entropy, and events per second) data to categorize sounds in unfiltered recordings according to the main source of sound present (birds, insects, geophony). Our approach showed more than 70% accuracy in label assignment in both datasets. The categories assigned were broad, but we believe this is a great improvement on traditional single index analysis of environmental recordings as we can now give ecological meaning to recordings in a semi-automated way that does not require expert knowledge and manual validation is only necessary for a small subset of the data. Furthermore, temporal autocorrelation, which is largely ignored by researchers, has been effectively eliminated through the time-series motif discovery technique applied here for the first time to ecoacoustic data. We expect that our approach will greatly assist researchers in the future as it will allow large datasets to be rapidly processed and labeled, enabling the screening of recordings for undesired sounds, such as wind, or target biophony (insects and birds) for bio ersity monitoring or bioacoustics research.

Publication

ConvergenCeconcepts

Publisher: The R Foundation

Date: 2009

DOI: 10.32614/RJ-2009-011

Publication

janfer95/SR_on_SWAN: SR_on_SWAN

Publisher: Zenodo

Date: 2022

DOI: 10.5281/ZENODO.7108005

Publication

Type-II generalized family-wise error rate formulas with application to sample size determination

Publisher: Wiley

Date: 23-02-2016

DOI: 10.1002/SIM.6909

Abstract: Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the q-generalized family-wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type-I error rate, we define a type-II r-generalized family-wise error rate, which is directly related to the r-power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the s le size for single-step and step-wise procedures. These are implemented in our R package rPowerS leSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute s le sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd.

Publication

A sliced inverse regression approach for data stream

Publisher: Springer Science and Business Media LLC

Date: 04-02-2014

DOI: 10.1007/S00180-014-0483-4

Publication

A Universal Approximate Cross-Validation Criterion for Regular Risk Functions

Publisher: Walter de Gruyter GmbH

Date: 2015

DOI: 10.1515/IJB-2015-0004

Abstract: Selection of estimators is an essential task in modeling. A general framework is that the estimators of a distribution are obtained by minimizing a function (the estimating function) and assessed using another function (the assessment function). A classical case is that both functions estimate an information risk (specifically cross-entropy) this corresponds to using maximum likelihood estimators and assessing them by Akaike information criterion (AIC). In more general cases, the assessment risk can be estimated by leave-one-out cross-validation. Since leave-one-out cross-validation is computationally very demanding, we propose in this paper a universal approximate cross-validation criterion under regularity conditions (UACVR). This criterion can be adapted to different types of estimators, including penalized likelihood and maximum a posteriori estimators, and also to different assessment risk functions, including information risk functions and continuous rank probability score (CRPS). UACVR reduces to Takeuchi information criterion (TIC) when cross-entropy is the risk for both estimation and assessment. We provide the asymptotic distributions of UACVR and of a difference of UACVR values for two estimators. We validate UACVR using simulations and provide an illustration on real data both in the psychometric context where estimators of the distributions of ordered categorical data derived from threshold models and models based on continuous approximations are compared.

Publication

Age at menarche and the risk of operative delivery

Publisher: Informa UK Limited

Date: 28-09-2017

DOI: 10.1080/14767058.2017.1381823

Publication

Pre‐diagnostic blood immune markers, incidence and progression of B‐cell lymphoma and multiple myeloma: Univariate and functionally informed multivariate analyses

Publisher: Wiley

Date: 26-04-2018

DOI: 10.1002/IJC.31536

Publication

CPMCGLM: an R package for p-value adjustment when looking for an optimal transformation of a single explanatory variable in generalized linear models

Publisher: Springer Science and Business Media LLC

Date: 16-04-2019

DOI: 10.1186/S12874-019-0711-2

Publication

Investigating trial and treatment heterogeneity in an individual patient data meta-analysis of survival data by means of the penalized maximum likelihood approach

Publisher: Wiley

Date: 2008

DOI: 10.1002/SIM.3161

Abstract: In a meta-analysis combining survival data from different clinical trials, an important issue is the possible heterogeneity between trials. Such intertrial variation can not only be explained by heterogeneity of treatment effects across trials but also by heterogeneity of their baseline risk. In addition, one might examine the relationship between magnitude of the treatment effect and the underlying risk of the patients in the different trials. Such a scenario can be accounted for by using additive random effects in the Cox model, with a random trial effect and a random treatment-by-trial interaction. We propose to use this kind of model with a general correlation structure for the random effects and to estimate parameters and hazard function using a semi-parametric penalized marginal likelihood method (maximum penalized likelihood estimators). This approach gives smoothed estimates of the hazard function, which represents incidence in epidemiology. The idea for the approach in this paper comes from the study of heterogeneity in a large meta-analysis of randomized trials in patients with head and neck cancers (meta-analysis of chemotherapy in head and neck cancers) and the effect of adding chemotherapy to locoregional treatment. The simulation study and the application demonstrate that the proposed approach yields satisfactory results and they illustrate the need to use a flexible variance-covariance structure for the random effects.

Publication

SMOTE-CD: SMOTE for compositional data

Publisher: Public Library of Science (PLoS)

Date: 29-06-2023

DOI: 10.1371/JOURNAL.PONE.0287705

Abstract: Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Overs ling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic ex les by computing a linear combination of selected existing data points, using compositional data operations. The performance of the SMOTE-CD is tested with three different regressors (Gradient Boosting tree, Neural Networks, Dirichlet regressor) applied to two real datasets and to synthetic generated data, and the performance is evaluated using accuracy, cross-entropy, F1-score, R2 score and RMSE. The results show improvements across all metrics, but the impact of overs ling on performance varies depending on the model and the data. In some cases, overs ling may lead to a decrease in performance for the majority class. However, for the real data, the best performance across all models is achieved when overs ling is used. Notably, the F1-score is consistently increased with overs ling. Unlike the original technique, the performance is not improved when combining overs ling of the minority classes and unders ling of the majority class. The Python package smote-cd implements the method and is available online.

Publication

Attributable risk estimation for adjusted disability multistate models

Publisher: Wiley

Date: 31-07-2012

DOI: 10.1002/BIMJ.201100222

Abstract: Attributable risk has become an important concept in clinical epidemiology. In this paper, we suggest to estimate the attributable risk of nosocomial infections using a multistate approach. Recently, a multistate model (called progressive disability model in the literature) has been developed in order to take into consideration both the time-dependency of the risk factor (e.g., nosocomial infections) and the presence of competing risks (e.g., death and discharge) at each time point. However, this approach does not take into account the possible heterogeneity of the study population. In this paper, we investigate an extension of this model and suggest an adjusted disability multistate model including covariates in each transition. This new multistate model has led us to define the concepts of overall and profiled attributable risk. We use a classical semiparametric approach to estimate the model and the new attributable risk. A simulation study is investigated and we show, in particular, that neglecting the presence of covariates when estimating the model can lead to an important bias. The methodology developed in this paper is applied to data on ventilator-associated pneumonia in 12 French intensive care units.

Publication

Bayesian meta‐analysis models for cross cancer genomic investigation of pleiotropic effects using group structure

Publisher: Wiley

Date: 27-12-2020

DOI: 10.1002/SIM.8855

Publication

A novel approach for biomarker selection and the integration of repeated measures experiments from two assays

Publisher: Springer Science and Business Media LLC

Date: 12-2012

DOI: 10.1186/1471-2105-13-325

Publication

Choice of prognostic estimators in joint models by estimating differences of expected conditional Kullback-Leibler risks

Publisher: Wiley

Date: 11-05-2012

DOI: 10.1111/J.1541-0420.2012.01753.X

Abstract: Prognostic estimators for a clinical event may use repeated measurements of markers in addition to fixed covariates. These measurements can be linked to the clinical event by joint models that involve latent features. When the objective is to choose between different prognosis estimators based on joint models, the conventional Akaike information criterion is not well adapted and decision should be based on predictive accuracy. We define an adapted risk function called expected prognostic cross-entropy. We define another risk function for the case of right-censored observations, the expected prognostic observed cross-entropy (EPOCE). These risks can be estimated by leave-one-out cross-validation, for which we give approximate formulas and asymptotic distributions. The approximated cross-validated estimator CVPOL (a) of EPOCE is studied in simulation and applied to the comparison of several joint latent class models for prognosis of recurrence of prostate cancer using prostate-specific antigen measurements.

Publication

A new method to explicitly estimate the shift of optimum along gradients in multispecies studies

Publisher: Wiley

Date: 10-02-2023

DOI: 10.1111/JBI.14570

Abstract: Optimum shifts in species–environment relationships are intensively studied in a wide range of ecological topics, including climate change and species invasion. Numerous statistical methods are used to study optimum shifts, but, to our knowledge, none explicitly estimate it. We extended an existing model to explicitly estimate optimum shifts for multiple species having symmetrical response curves. We called this new Bayesian hierarchical model the Explicit Hierarchical Model of Optimum Shifts (EHMOS). All locations. All taxa. In a simulation study, we compared the accuracy of EHMOS to a mean comparison method and a Bayesian generalized linear mixed model (GLMM). Specifically, we tested if the accuracy of the methods was sensitive to (1) s ling design, (2) species optimum position and (3) species ecological specialization. In addition, we compared the three methods using a real dataset of investigated optimum shifts in 24 Orthopteran species between two time periods along an elevation gradient. Of all the simulated scenarios, EHMOS was the most accurate method. GLMM was the most sensitive method to species optimum position, providing unreliable estimates in the presence of marginal species, that is, species with an optimum close to a s ling boundary. The mean comparison method was also sensitive to species optimum position and ecological specialization, especially in an unbalanced s ling design, with high negative bias and low interval coverage compared to EHMOS. The case study results obtained with EHMOS were consistent with what is expected considering ongoing climate change, with mostly upward shifts, which further improved confidence in the accuracy of the EHMOS method. Explicit Hierarchical Model of Optimum Shifts could be used for a wide range of topics and extended to produce new insights, especially in climate change studies. Explicit estimation of optimum shifts notably allows investigation of ecological assumptions that could explain interspecific variability of these shifts.

Publication

Age‐related changes in murine myometrial transcript profile are mediated by exposure to the female sex hormones

Publisher: Wiley

Date: 21-10-2015

DOI: 10.1111/ACEL.12406

Publication

Assessment of Flooding Hazards at An Engineered Beach during Extreme Events: Biarritz, SW France

Publisher: Coastal Education and Research Foundation

Date: 05-2018

DOI: 10.2112/SI85-161.1

Publication

Detecting the effects of inter-annual and seasonal changes in environmental factors on the striped red mullet population in the Bay of Biscay

Publisher: Elsevier BV

Date: 03-2021

DOI: 10.1016/J.SEARES.2021.102008

Publication

Estimating the expectation of the log-likelihood with censored data for estimator selection

Publisher: Springer Science and Business Media LLC

Date: 12-2004

DOI: 10.1007/S10985-004-4772-Z

Abstract: A criterion for choosing an estimator in a family of semi-parametric estimators from incomplete data is proposed. This criterion is the expected observed log-likelihood (ELL). Adapted versions of this criterion in case of censored data and in presence of explanatory variables are exhibited. We show that likelihood cross-validation (LCV) is an estimator of ELL and we exhibit three bootstrap estimators. A simulation study considering both families of kernel and penalized likelihood estimators of the hazard function (indexed on a smoothing parameter) demonstrates good results of LCV and a bootstrap estimator called ELL(bboot). We apply the ELL(bboot) criterion to compare the kernel and penalized likelihood estimators to estimate the risk of developing dementia for women using data from a large cohort study.

Publication

Power and sample size determination in clinical trials with multiple primary continuous correlated endpoints

Publisher: Informa UK Limited

Date: 04-03-2014

DOI: 10.1080/10543406.2013.860156

Abstract: The use of two or more primary correlated endpoints is becoming increasingly common. A mandatory approach when analyzing data from such clinical trials is to control the family-wise error rate (FWER). In this context, we provide formulas for computation of s le size and for data analysis. Two approaches are discussed: an in idual method based on a union-intersection procedure and a global procedure, based on a multivariate model that can take into account adjustment variables. These methods are illustrated with simulation studies and applications. An R package known as rPowerS leSize is also available.

Publication

Choice between semi-parametric estimators of Markov and non-Markov multi-state models from coarsened observations

Publisher: Wiley

Date: 27-11-2007

DOI: 10.1111/J.1467-9469.2006.00536.X

Publication

Design and rationale of the COVID-19 Critical Care Consortium international, multicentre, observational study

Publisher: BMJ

Date: 12-2020

DOI: 10.1136/BMJOPEN-2020-041417

Abstract: There is a paucity of data that can be used to guide the management of critically ill patients with COVID-19. In response, a research and data-sharing collaborative—The COVID-19 Critical Care Consortium—has been assembled to harness the cumulative experience of intensive care units (ICUs) worldwide. The resulting observational study provides a platform to rapidly disseminate detailed data and insights crucial to improving outcomes. This is an international, multicentre, observational study of patients with confirmed or suspected SARS-CoV-2 infection admitted to ICUs. This is an evolving, open-ended study that commenced on 1 January 2020 and currently includes sites in over 48 countries. The study enrols patients at the time of ICU admission and follows them to the time of death, hospital discharge or 28 days post-ICU admission, whichever occurs last. Key data, collected via an electronic case report form devised in collaboration with the International Severe Acute Respiratory and Emerging Infection Consortium/Short Period Incidence Study of Severe Acute Respiratory Illness networks, include: patient demographic data and risk factors, clinical features, severity of illness and respiratory failure, need for non-invasive and/or mechanical ventilation and/or extracorporeal membrane oxygenation and associated complications, as well as data on adjunctive therapies. Local principal investigators will ensure that the study adheres to all relevant national regulations, and that the necessary approvals are in place before a site may contribute data. In jurisdictions where a waiver of consent is deemed insufficient, prospective, representative or retrospective consent will be obtained, as appropriate. A web-based dashboard has been developed to provide relevant data and descriptive statistics to international collaborators in real-time. It is anticipated that, following study completion, all de-identified data will be made open access. ACTRN12620000421932 ( anzctr.org.au/ACTRN12620000421932.aspx ).

Publication

A multivariate approach to investigate the combined biological effects of multiple exposures

Publisher: BMJ

Date: 21-03-2018

DOI: 10.1136/JECH-2017-210061

Abstract: Epidemiological studies provide evidence that environmental exposures may affect health through complex mixtures. Formal investigation of the effect of exposure mixtures is usually achieved by modelling interactions, which relies on strong assumptions relating to the identity and the number of the exposures involved in such interactions, and on the order and parametric form of these interactions. These hypotheses become difficult to formulate and justify in an exposome context, where influential exposures are numerous and heterogeneous. To capture both the complexity of the exposome and its possibly pleiotropic effects, models handling multivariate predictors and responses, such as partial least squares (PLS) algorithms, can prove useful. As an illustrative ex le, we applied PLS models to data from a study investigating the inflammatory response (blood concentration of 13 immune markers) to the exposure to four disinfection by-products (one brominated and three chlorinated compounds), while swimming in a pool. To accommodate the multiple observations per participant (n=60 before and after the swim), we adopted a multilevel extension of PLS algorithms, including sparse PLS models shrinking loadings coefficients of unimportant predictors (exposures) and/or responses (protein levels). Despite the strong correlation among co-occurring exposures, our approach identified a subset of exposures (n=3/4) affecting the exhaled levels of 8 (out of 13) immune markers. PLS algorithms can easily scale to high-dimensional exposures and responses, and prove useful for exposome research to identify sparse sets of exposures jointly affecting a set of (selected) biological markers. Our descriptive work may guide these extensions for higher dimensional data.

Publication

R2GUESS: A Graphics Processing Unit-BasedRPackage for Bayesian Variable Selection Regression of Multivariate Responses

Publisher: Foundation for Open Access Statistic

Date: 2016

DOI: 10.18637/JSS.V069.I02

Publication

Central subspaces review

Publisher: Institute of Mathematical Statistics

Date: 2022

DOI: 10.1214/22-SS138

Publication

In-situ measurements of energetic depth-limited wave loading

Publisher: Elsevier BV

Date: 08-2022

DOI: 10.1016/J.APOR.2022.103216

Publication

Odor vapor pressure and quality modulate local field potential oscillatory patterns in the olfactory bulb of the anesthetized rat

Publisher: Wiley

Date: 03-2008

DOI: 10.1111/J.1460-9568.2008.06123.X

Abstract: A central question in chemical senses is the way that odorant molecules are represented in the brain. To date, many studies, when taken together, suggest that structural features of the molecules are represented through a spatio-temporal pattern of activation in the olfactory bulb (OB), in both glomerular and mitral cell layers. Mitral/tufted cells interact with a large population of inhibitory interneurons resulting in a temporal patterning of bulbar local field potential (LFP) activity. We investigated the possibility that molecular features could determine the temporal pattern of LFP oscillatory activity in the OB. For this purpose, we recorded the LFPs in the OB of urethane-anesthetized, freely breathing rats in response to series of aliphatic odorants varying subtly in carbon-chain length or functional group. In concordance with our previous reports, we found that odors evoked oscillatory activity in the LFP signal in both the beta and gamma frequency bands. Analysis of LFP oscillations revealed that, although molecular features have almost no influence on the intrinsic characteristics of LFP oscillations, they influence the temporal patterning of bulbar oscillations. Alcohol family odors rarely evoke gamma oscillations, whereas ester family odors rather induce oscillatory patterns showing beta/gamma alternation. Moreover, for molecules with the same functional group, the probability of gamma occurrence is correlated to the vapor pressure of the odor. The significance of the relation between odorant features and oscillatory regimes along with their functional relevance are discussed.

Publication

Pooled marginal slicing approach via SIR α with discrete covariables

Publisher: Springer Science and Business Media LLC

Date: 14-08-2007

DOI: 10.1007/S00180-007-0078-4

Publication

Influence of the type of knee brace on clinical postoperative evolution after anterior cruciate ligament reconstructions in competitive sportspeople.,Influence du type d'orthèse de genou sur l'évolution clinique postopératoire d'une chirurgie du ligament croisé antérieur chez le sportif

Publisher: Elsevier BV

Date: 09-2011

DOI: 10.1016/J.JTS.2011.07.006

Publication

Bootstrap choice of estimators in parametric and semiparametric families

Publisher: Wiley

Date: 03-2003

DOI: 10.1111/1541-0420.00020

Abstract: Ishiguro, Sakamoto, and Kitagawa (1997, Annals of the Institute of Statistical Mathematics 49, 411-434) proposed EIC as an extension of Akaike criterion (AIC) the idea leading to EIC is to correct the bias of the log-likelihood, considered as an estimator of the Kullback-Leibler information, using bootstrap. We develop this criterion for its use in multivariate semiparametric situations, and argue that it can be used for choosing among parametric and semiparametric estimators. A simulation study based on aregression model shows that EIC is better than its competitors although likelihood cross-validation performs nearly as well except for small s le size. Its use is illustrated by estimating the mean evolution of viral RNA levels in a group of infants infected by HIV.

Publication

Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

Publisher: Springer Science and Business Media LLC

Date: 11-01-2012

DOI: 10.1007/S00357-012-9097-0

Publication

Correction of the P-value after multiple coding of an explanatory variable in logistic regression

Publisher: Wiley

Date: 2001

DOI: 10.1002/SIM.916

Abstract: We propose a method and a program to determine a significance level for a series of codings of an explanatory variable in logistic regression. Dichotomous and Box-Cox transformations are considered. Three methods of correcting the significance level are studied: the Bonferroni method Efron's method, which uses the correlation between successive tests, and the exact calculation by numerical integration using all correlations. A simulation study has led to a strategy for the choice and number of the different codings of the variable. This method is illustrated using the data of a study of the relation between cholesterol and dementia.

Publication

Computation of the p-value of the maximum of score tests in the generalized linear model; application to multiple coding

Publisher: Elsevier BV

Date: 2005

DOI: 10.1016/J.SPL.2004.10.019

Publication

Automatic creation of storm impact database based on video monitoring and convolutional neural networks

Publisher: MDPI AG

Date: 15-05-2021

DOI: 10.3390/RS13101933

Abstract: Data about storm impacts are essential for the disaster risk reduction process, but unlike data about storm characteristics, they are not routinely collected. In this paper, we demonstrate the high potential of convolutional neural networks to automatically constitute storm impact database using timestacks images provided by coastal video monitoring stations. Several convolutional neural network architectures and methods to deal with class imbalance were tested on two sites (Biarritz and Zarautz) to find the best practices for this classification task. This study shows that convolutional neural networks are well adapted for the classification of timestacks images into storm impact regimes. Overall, the most complex and deepest architectures yield better results. Indeed, the best performances are obtained with the VGG16 architecture for both sites with F-scores of 0.866 for Biarritz and 0.858 for Zarautz. For the class imbalance problem, the method of overs ling shows best classification accuracy with F-scores on average 30% higher than the ones obtained with cost sensitive learning. The transferability of the learning method between sites is also investigated and shows conclusive results. This study highlights the high potential of convolutional neural networks to enhance the value of coastal video monitoring data that are routinely recorded on many coastal sites. Furthermore, it shows that this type of deep neural network can significantly contribute to the setting up of risk databases necessary for the determination of storm risk indicators and, more broadly, for the optimization of risk-mitigation measures.

Publication

Age at menarche and the risk of operative delivery.

Publisher: Apollo - University of Cambridge Repository

Date: 2018

DOI: 10.17863/CAM.22138

Publication

Using a supervised principal components analysis for variable selection in high-dimensional datasets reduces false discovery rates

Publisher: Cold Spring Harbor Laboratory

Date: 15-05-2020

DOI: 10.1101/2020.05.15.097774

Abstract: High-dimensional datasets, where the number of variables ‘ p ’ is much larger compared to the number of s les ‘ n ’, are ubiquitous and often render standard classification and regression techniques unreliable due to overfitting. An important research problem is feature selection — ranking of candidate variables based on their relevance to the outcome variable and retaining those that satisfy a chosen criterion. In this article, we propose a computationally efficient variable selection method based on principal component analysis. The method is very simple, accessible, and suitable for the analysis of high-dimensional datasets. It allows to correct for population structure in genome-wide association studies (GWAS) which otherwise would induce spurious associations and is less likely to overfit. We expect our method to accurately identify important features but at the same time reduce the False Discovery Rate (FDR) (the expected proportion of erroneously rejected null hypotheses) through accounting for the correlation between variables and through de-noising data in the training phase, which also make it robust to outliers in the training data. Being almost as fast as univariate filters, our method allows for valid statistical inference. The ability to make such inferences sets this method apart from most of the current multivariate statistical tools designed for today’s high-dimensional data. We demonstrate the superior performance of our method through extensive simulations. A semi-real gene-expression dataset, a challenging childhood acute lymphoblastic leukemia (CALL) gene expression study, and a GWAS that attempts to identify single-nucleotide polymorphisms (SNPs) associated with the rice grain length further demonstrate the usefulness of our method in genomic applications. An integral part of modern statistical research is feature selection, which has claimed various scientific discoveries, especially in the emerging genomics applications such as gene expression and proteomics studies, where data has thousands or tens of thousands of features but a limited number of s les. However, in practice, due to unavailability of suitable multivariate methods, researchers often resort to univariate filters when it comes to deal with a large number of variables. These univariate filters do not take into account the dependencies between variables because they independently assess variables one-by-one. This leads to loss of information, loss of statistical power (the probability of correctly rejecting the null hypothesis) and potentially biased estimates. In our paper, we propose a new variable selection method. Being computationally efficient, our method allows for valid inference. The ability to make such inferences sets this method apart from most of the current multivariate statistical tools designed for today’s high-dimensional data.

Publication

Detecting Technical Anomalies in High-Frequency Water-Quality Data Using Artificial Neural Networks

Publisher: American Chemical Society (ACS)

Date: 28-08-2020

DOI: 10.1021/ACS.EST.0C04069

Publication

Investigating hospital heterogeneity with a multi-state frailty model: application to nosocomial pneumonia disease in intensive care units

Publisher: Springer Science and Business Media LLC

Date: 15-06-2012

DOI: 10.1186/1471-2288-12-79

Publication

Choice of estimators based on different observations

Publisher: Wiley

Date: 10-05-2011

DOI: 10.1111/J.1467-9469.2010.00699.X

Publication

Understanding links between water-quality variables and nitrate concentration in freshwater streams using high frequency sensor data

Publisher: Public Library of Science (PLoS)

Date: 30-06-2023

DOI: 10.1371/JOURNAL.PONE.0287640

Abstract: Real-time monitoring using in-situ sensors is becoming a common approach for measuring water-quality within watersheds. High-frequency measurements produce big datasets that present opportunities to conduct new analyses for improved understanding of water-quality dynamics and more effective management of rivers and streams. Of primary importance is enhancing knowledge of the relationships between nitrate, one of the most reactive forms of inorganic nitrogen in the aquatic environment, and other water-quality variables. We analysed high-frequency water-quality data from in-situ sensors deployed in three sites from different watersheds and climate zones within the National Ecological Observatory Network, USA. We used generalised additive mixed models to explain the nonlinear relationships at each site between nitrate concentration and conductivity, turbidity, dissolved oxygen, water temperature, and elevation. Temporal auto-correlation was modelled with an auto-regressive–moving-average (ARIMA) model and we examined the relative importance of the explanatory variables. Total deviance explained by the models was high for all sites (99%). Although variable importance and the smooth regression parameters differed among sites, the models explaining the most variation in nitrate contained the same explanatory variables. This study demonstrates that building a model for nitrate using the same set of explanatory water-quality variables is achievable, even for sites with vastly different environmental and climatic characteristics. Applying such models will assist managers to select cost-effective water-quality variables to monitor when the goals are to gain a spatial and temporal in-depth understanding of nitrate dynamics and adapt management plans accordingly.

Publication

Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches

Publisher: Walter de Gruyter GmbH

Date: 06-01-2021

DOI: 10.1515/IJB-2020-0083

Abstract: Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.

Publication

Bagging versions of sliced inverse regression

Publisher: Informa UK Limited

Date: 25-05-2010

DOI: 10.1080/03610920902948251

Publication

A dynamic view to moment matching of truncated distributions

Publisher: Elsevier BV

Date: 09-2015

DOI: 10.1016/J.SPL.2015.05.006

Publication

Penalized Partial Least Square applied to structured data

Publisher: Springer Science and Business Media LLC

Date: 04-03-2019

DOI: 10.1007/S40065-019-0248-6

Publication

Multi-Site and Multi-Year Remote Records of Operative Temperatures with Biomimetic Loggers Reveal Spatio-Temporal Variability in Mountain Lizard Activity and Persistence Proxy Estimates

Publisher: MDPI AG

Date: 08-09-2020

DOI: 10.3390/RS12182908

Abstract: Commonly, when studies deal with the effects of climate change on bio ersity, mean value is used more than other parameters. However, climate change also leads to greater temperature variability, and many papers have demonstrated its importance in the implementation of bio ersity response strategies. We studied the spatio-temporal variability of activity time and persistence index, calculated from operative temperatures measured at three sites over three years, for a mountain endemic species. Temperatures were recorded with biomimetic loggers, an original remote sensing technology, which has the same advantages as these tools but is suitable for recording biological organisms data. Among the 42 tests conducted, 71% were significant for spatial variability and 28% for temporal variability. The differences in daily activity times and in persistence indices demonstrated the effects of the micro-habitat, habitat, slope, altitude, hydrography, and year. These observations have highlighted the great variability existence in the environmental temperatures experienced by lizard populations. Thus, our study underlines the importance to implement multi-year and multi-site studies to quantify the variability and produce more representative results. These studies can be facilitated by the use of biomimetic loggers, for which a user guide is provided in the last part of this paper.

Publication

Using Random forest and Gradient boosting trees to improve wave forecast at a specific location

Publisher: Elsevier BV

Date: 11-2020

DOI: 10.1016/J.APOR.2020.102339

Publication

Association between low-grade inflammation and Breast cancer and B-cell Myeloma and Non-Hodgkin Lymphoma: findings from two prospective cohorts

Publisher: Springer Science and Business Media LLC

Date: 17-07-2018

DOI: 10.1038/S41598-018-29041-1

Abstract: Chronic inflammation may be involved in cancer development and progression. Using 28 inflammatory-related proteins collected from prospective blood s les from two case-control studies nested in the Italian component of the European Prospective Investigation into Cancer and nutrition (n = 261) and in the Northern Sweden Health and Disease Study (n = 402), we tested the hypothesis that an inflammatory score is associated with breast cancer (BC) and Β-cell Non-Hodgkin Lymphoma (B-cell NHL, including 68 multiple myeloma cases) onset. We modelled the relationship between this inflammatory score and the two cancers studied: (BC and B-cell NHL) using generalised linear models, and assessed, through adjustments the role of behaviours and lifestyle factors. Analyses were performed by cancer types pooling both populations, and stratified by cohorts, and time to diagnosis. Our results suggested a lower inflammatory score in B-cell NHL cases (β = −1.28, p = 0.012), and, to lesser, extent with BC (β = −0.96, p = 0.33) compared to controls, mainly driven by cancer cases diagnosed less than 6 years after enrolment. These associations were not affected by subsequent adjustments for potential intermediate confounders, notably behaviours. Sensitivity analyses indicated that our findings were not affected by the way the inflammatory score was calculated. These observations call for further studies involving larger populations, larger variety of cancer types and repeated measures of larger panel of inflammatory markers.

Publication

Sparse partial least squares with group and subgroup structure

Publisher: Wiley

Date: 11-06-2018

DOI: 10.1002/SIM.7821

Abstract: Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at att-sutton/sgspls.

Publication

A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches

Publisher: Springer Science and Business Media LLC

Date: 17-02-2012

DOI: 10.1007/S00180-011-0241-9

Publication

Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

Publisher: Springer Science and Business Media LLC

Date: 08-06-2013

DOI: 10.1186/1471-2288-13-75

Publication

Community Evaluation of Glycoproteomics Informatics Solutions Reveals High-Performance Search Strategies of SerumN- andO-Glycopeptide Data

Publisher: Cold Spring Harbor Laboratory

Date: 15-03-2021

DOI: 10.1101/2021.03.14.435332

Abstract: Glycoproteome profiling (glycoproteomics) is a powerful yet analytically challenging research tool. The complex tandem mass spectra generated from glycopeptide mixtures require sophisticated analysis pipelines for structural determination. Diverse software aiding the process have appeared, but their relative performance remains untested. Conducted through the HUPO Human Proteome Project – Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates the performance of informatics solutions for system-wide glycopeptide analysis. Mass spectrometry-based glycoproteomics datasets from human serum were shared with all teams. The relative team performance for N - and O -glycopeptide data analysis was comprehensively established and validated through orthogonal performance tests. Excitingly, several high-performance glycoproteomics informatics solutions were identified. While the study illustrated that significant informatics challenges remain, as indicated by a high discordance between annotated glycopeptides, lists of high-confidence (consensus) glycopeptides were compiled from the standardised team reports. Deep analysis of the performance data revealed key performance-associated search variables and led to recommendations for improved “high coverage” and “high accuracy” glycoproteomics search strategies. This study concludes that erse software for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies, and specifies key variables that may guide future software developments and assist informatics decision-making in glycoproteomics.

Publication

Spatial and time domain analysis of eye-tracking data during screening of brain magnetic resonance images

Publisher: Public Library of Science (PLoS)

Date: 02-12-2021

DOI: 10.1371/JOURNAL.PONE.0260717

Abstract: Eye-tracking research has been widely used in radiology applications. Prior studies exclusively analysed either temporal or spatial eye-tracking features, both of which alone do not completely characterise the spatiotemporal dynamics of radiologists’ gaze features. Our research aims to quantify human visual search dynamics in both domains during brain stimuli screening to explore the relationship between reader characteristics and stimuli complexity. The methodology can be used to discover strategies to aid trainee radiologists in identifying pathology, and to select regions of interest for machine vision applications. The study was performed using eye-tracking data 5 seconds in duration from 57 readers (15 Brain-experts, 11 Other-experts, 5 Registrars and 26 Naïves) for 40 neuroradiological images as stimuli (i.e., 20 normal and 20 pathological brain MRIs). The visual scanning patterns were analysed by calculating the fractal dimension (FD) and Hurst exponent (HE) using re-scaled range (R/S) and detrended fluctuation analysis (DFA) methods. The FD was used to measure the spatial geometrical complexity of the gaze patterns, and the HE analysis was used to measure participants’ focusing skill. The focusing skill is referred to persistence/anti-persistence of the participants’ gaze on the stimulus over time. Pathological and normal stimuli were analysed separately both at the “First Second” and full “Five Seconds” viewing duration. All experts were more focused and a had higher visual search complexity compared to Registrars and Naïves. This was seen in both the pathological and normal stimuli in the first and five second analyses. The Brain-experts subgroup was shown to achieve better focusing skill than Other-experts due to their domain specific expertise. Indeed, the FDs found when viewing pathological stimuli were higher than those in normal ones. Viewing normal stimuli resulted in an increase of FD found in five second data, unlike pathological stimuli, which did not change. In contrast to the FDs, the scanpath HEs of pathological and normal stimuli were similar. However, participants’ gaze was more focused for “Five Seconds” than “First Second” data. The HE analysis of the scanpaths belonging to all experts showed that they have greater focus than Registrars and Naïves. This may be related to their higher visual search complexity than non-experts due to their training and expertise.

Publication

Leveraging pleiotropic association using sparse group variable selection in genomics data

Publisher: Springer Science and Business Media LLC

Date: 07-01-2022

DOI: 10.1186/S12874-021-01491-8

Abstract: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap s ling strategy is provided to explore the stability of the penalised methods. Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.

Publication

GCPBayes pipeline: a tool for exploring pleiotropy at the gene level

Publisher: Oxford University Press (OUP)

Date: 05-07-2023

DOI: 10.1093/NARGAB/LQAD065

Abstract: Cross-phenotype association using gene-set analysis can help to detect pleiotropic genes and inform about common mechanisms between diseases. Although there are an increasing number of statistical methods for exploring pleiotropy, there is a lack of proper pipelines to apply gene-set analysis in this context and using genome-scale data in a reasonable running time. We designed a user-friendly pipeline to perform cross-phenotype gene-set analysis between two traits using GCPBayes, a method developed by our team. All analyses could be performed automatically by calling for different scripts in a simple way (using a Shiny app, Bash or R script). A Shiny application was also developed to create different plots to visualize outputs from GCPBayes. Finally, a comprehensive and step-by-step tutorial on how to use the pipeline is provided in our group’s GitHub page. We illustrated the application on publicly available GWAS (genome-wide association studies) summary statistics data to identify breast cancer and ovarian cancer susceptibility genes. We have shown that the GCPBayes pipeline could extract pleiotropic genes previously mentioned in the literature, while it also provided new pleiotropic genes and regions that are worthwhile for further investigation. We have also provided some recommendations about parameter selection for decreasing computational time of GCPBayes on genome-scale data.

Publication

[HDDA] sparse subspace constrained partial least squares

Publisher: Informa UK Limited

Date: 09-12-2018

DOI: 10.1080/00949655.2018.1555830

Publication

PLS for Big Data: A unified parallel algorithm for regularised group PLS

Publisher: Institute of Mathematical Statistics

Date: 2019

DOI: 10.1214/19-SS125

Publication

CEoptim: Cross-entropy R package for optimization

Publisher: Foundation for Open Access Statistic

Date: 2017

DOI: 10.18637/JSS.V076.I08

Publication

A deep learning super-resolution model to speed up computations of coastal sea states

Publisher: Elsevier BV

Date: 12-2023

DOI: 10.1016/J.APOR.2023.103776

Publication

Penalized partial least squares for pleiotropy

Publisher: Springer Science and Business Media LLC

Date: 24-02-2021

DOI: 10.1186/S12859-021-03968-1

Abstract: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an ex le of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Publication

Mapping of Coral Reefs with Multispectral Satellites: A Review of Recent Papers

Publisher: MDPI AG

Date: 07-11-2021

DOI: 10.3390/RS13214470

Abstract: Coral reefs are an essential source of marine bio ersity, but they are declining at an alarming rate under the combined effects of global change and human pressure. A precise mapping of coral reef habitat with high spatial and time resolutions has become a necessary step for monitoring their health and evolution. This mapping can be achieved remotely thanks to satellite imagery coupled with machine-learning algorithms. In this paper, we review the different satellites used in recent literature, as well as the most common and efficient machine-learning methods. To account for the recent explosion of published research on coral reel mapping, we especially focus on the papers published between 2018 and 2020. Our review study indicates that object-based methods provide more accurate results than pixel-based ones, and that the most accurate methods are Support Vector Machine and Random Forest. We emphasize that the satellites with the highest spatial resolution provide the best images for benthic habitat mapping. We also highlight that preprocessing steps (water column correction, sunglint removal, etc.) and additional inputs (bathymetry data, aerial photographs, etc.) can significantly improve the mapping accuracy.

Publication

Reconstructing Missing and Anomalous Data Collected from High-Frequency In-Situ Sensors in Fresh Waters

Publisher: MDPI AG

Date: 04-12-2021

DOI: 10.3390/IJERPH182312803

Abstract: In situ sensors that collect high-frequency data are used increasingly to monitor aquatic environments. These sensors are prone to technical errors, resulting in unrecorded observations and/or anomalous values that are subsequently removed and create gaps in time series data. We present a framework based on generalized additive and auto-regressive models to recover these missing data. To mimic sporadically missing (i) single observations and (ii) periods of contiguous observations, we randomly removed (i) point data and (ii) day- and week-long sequences of data from a two-year time series of nitrate concentration data collected from Arikaree River, USA, where synoptically collected water temperature, turbidity, conductance, elevation, and dissolved oxygen data were available. In 72% of cases with missing point data, predicted values were within the sensor precision interval of the original value, although predictive ability declined when sequences of missing data occurred. Precision also depended on the availability of other water quality covariates. When covariates were available, even a sudden, event-based peak in nitrate concentration was reconstructed well. By providing a promising method for accurate prediction of missing data, the utility and confidence in summary statistics and statistical trends will increase, thereby assisting the effective monitoring and management of fresh waters and other at-risk ecosystems.

Publication

Improving performances of MCMC for Nearest Neighbor Gaussian Process models with full data augmentation

Publisher: Elsevier BV

Date: 04-2022

DOI: 10.1016/J.CSDA.2021.107368

Publication

Selection between proportional and stratified hazards models based on expected log-likelihood

Publisher: Springer Science and Business Media LLC

Date: 03-08-2007

DOI: 10.1007/S00180-007-0079-3

Publication

Application of artificial intelligence and radiomics in pituitary neuroendocrine and sellar tumors

Publisher: Springer Science and Business Media LLC

Date: 27-11-2022

DOI: 10.1007/S00234-021-02845-1

Abstract: To systematically review the literature regarding the application of machine learning (ML) of magnetic resonance imaging (MRI) radiomics in common sellar tumors. To identify future directions for application of ML in sellar tumor MRI. PubMed, Medline, Embase, Google Scholar, Scopus, ArxIV, and bioRxiv were searched to identify relevant studies published between 2010 and September 2021. Studies were included if they specifically involved ML of MRI radiomics in the analysis of sellar masses. Risk of bias assessment was performed using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) Tool. Fifty-eight articles were identified for review. All papers utilized retrospective data, and a quantitative systematic review was performed for thirty-one studies utilizing a public dataset which compared pituitary adenomas, meningiomas, and gliomas. One of the analyzed architectures yielded the highest classification accuracy of 0.996. The remaining twenty-seven articles were qualitatively reviewed and showed promising findings in predicting specific tumor characteristics such as tumor consistency, Ki-67 proliferative index, and post-surgical recurrence. This review highlights the potential clinical application of ML using MRI radiomic data of the sellar region in diagnosis and predicting treatment outcomes. We describe future directions for practical application in the clinical care of patients with pituitary neuroendocrine and other sellar tumors.

Publication

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

Publisher: Public Library of Science (PLoS)

Date: 08-08-2013

DOI: 10.1371/JOURNAL.PGEN.1003657

Publication

janfer95/SR_on_SWAN: SR_on_SWAN

Publisher: Zenodo

Date: 2022

DOI: 10.5281/ZENODO.5910947

Publication

Forecasting intensifying disturbance effects on coral reefs

Publisher: Wiley

Date: 25-03-2020

DOI: 10.1111/GCB.15059

Publication

Application of the bootstrap approach to the choice of dimension and the α parameter in the SIR α method

Publisher: Informa UK Limited

Date: 19-05-2008

DOI: 10.1080/03610910801889011

Publication

Teacher's corner understanding convergence concepts: A visual-minded and graphical simulation-based approach

Publisher: Informa UK Limited

Date: 05-2009

DOI: 10.1198/TAS.2009.0032

Publication

Integrative Analysis of Immunological Data to Explore Chronic Immune T-Cell Activation in Successfully Treated HIV Patients

Publisher: Public Library of Science (PLoS)

Date: 03-01-2017

DOI: 10.1371/JOURNAL.PONE.0169164

Publication

Bayesian networks to predict storm impact using data from both monitoring networks and statistical learning methods

Publisher: Springer Science and Business Media LLC

Date: 26-09-2023

DOI: 10.1007/S11069-022-05625-Z

Publication

A sliced inverse regression approach for a stratified population

Publisher: Informa UK Limited

Date: 11-2011

DOI: 10.1080/03610926.2010.501940

Publication

Estimation of extended mixed models using latent classes and latent processes: The R package lcmm

Publisher: Foundation for Open Access Statistic

Date: 2017

DOI: 10.18637/JSS.V078.I02

Publication

Classification algorithm for high‐dimensional protein markers in time‐course data

Publisher: Wiley

Date: 25-08-2020

DOI: 10.1002/SIM.8720

Publication

An appraisal of respiratory system compliance in mechanically ventilated covid-19 patients

Publisher: Springer Science and Business Media LLC

Date: 09-06-2021

DOI: 10.1186/S13054-021-03518-4

Abstract: Heterogeneous respiratory system static compliance ( C RS ) values and levels of hypoxemia in patients with novel coronavirus disease (COVID-19) requiring mechanical ventilation have been reported in previous small-case series or studies conducted at a national level. We designed a retrospective observational cohort study with rapid data gathering from the international COVID-19 Critical Care Consortium study to comprehensively describe C RS —calculated as: tidal volume/[airway plateau pressure-positive end-expiratory pressure (PEEP)]—and its association with ventilatory management and outcomes of COVID-19 patients on mechanical ventilation (MV), admitted to intensive care units (ICU) worldwide. We studied 745 patients from 22 countries, who required admission to the ICU and MV from January 14 to December 31, 2020, and presented at least one value of C RS within the first seven days of MV. Median (IQR) age was 62 (52–71), patients were predominantly males (68%) and from Europe/North and South America (88%). C RS , within 48 h from endotracheal intubation, was available in 649 patients and was neither associated with the duration from onset of symptoms to commencement of MV ( p = 0.417) nor with PaO 2 /FiO 2 ( p = 0.100). Females presented lower C RS than males (95% CI of C RS difference between females-males: − 11.8 to − 7.4 mL/cmH 2 O p 0.001), and although females presented higher body mass index (BMI), association of BMI with C RS was marginal ( p = 0.139). Ventilatory management varied across C RS range, resulting in a significant association between C RS and driving pressure (estimated decrease − 0.31 cmH 2 O/L per mL/cmH 2 0 of C RS , 95% CI − 0.48 to − 0.14, p 0.001). Overall, 28-day ICU mortality, accounting for the competing risk of being discharged within the period, was 35.6% (SE 1.7). Cox proportional hazard analysis demonstrated that C RS (+ 10 mL/cm H 2 O) was only associated with being discharge from the ICU within 28 days (HR 1.14, 95% CI 1.02–1.28, p = 0.018). This multicentre report provides a comprehensive account of C RS in COVID-19 patients on MV. C RS measured within 48 h from commencement of MV has marginal predictive value for 28-day mortality, but was associated with being discharged from ICU within the same period. Trial documentation: Available at tudy . Trial registration : ACTRN12620000421932.

Publication

ClustOfVar

Publisher: Foundation for Open Access Statistic

Date: 2012

DOI: 10.18637/JSS.V050.I13

Publication

Investigation of common genetic risk factors between thyroid traits and breast cancer

Publisher: Oxford University Press (OUP)

Date: 22-09-2023

DOI: 10.1093/HMG/DDAD159

Publication

BIG-SIR a sliced inverse regression approach for massive data

Publisher: International Press of Boston

Date: 2016

DOI: 10.4310/SII.2016.V9.N4.A10

Publication

Group and sparse group partial least square approaches applied in genomics context

Publisher: Oxford University Press (OUP)

Date: 10-09-2015

DOI: 10.1093/BIOINFORMATICS/BTV535

Abstract: Motivation: The association between two blocks of ‘omics’ data brings challenging issues in computational biology due to their size and complexity. Here, we focus on a class of multivariate statistical methods called partial least square (PLS). Sparse version of PLS (sPLS) operates integration of two datasets while simultaneously selecting the contributing variables. However, these methods do not take into account the important structural or group effects due to the relationship between markers among biological pathways. Hence, considering the predefined groups of markers (e.g. genesets), this could improve the relevance and the efficacy of the PLS approach. Results: We propose two PLS extensions called group PLS (gPLS) and sparse gPLS (sgPLS). Our algorithm enables to study the relationship between two different types of omics data (e.g. SNP and gene expression) or between an omics dataset and multivariate phenotypes (e.g. cytokine secretion). We demonstrate the good performance of gPLS and sgPLS compared with the sPLS in the context of grouped data. Then, these methods are compared through an HIV therapeutic vaccine trial. Our approaches provide parsimonious models to reveal the relationship between gene abundance and the immunological response to the vaccine. Availability and implementation: The approach is implemented in a comprehensive R package called sgPLS available on the CRAN. Contact: b.liquet@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Asymptotic distribution of score statistics for spatial cluster detection with censored data

Publisher: Wiley

Date: 24-11-2008

DOI: 10.1111/J.1541-0420.2008.01132_1.X

Abstract: Cook, Gold, and Li (2007, Biometrics 63, 540-549) extended the Kulldorff (1997, Communications in Statistics 26, 1481-1496) scan statistic for spatial cluster detection to survival-type observations. Their approach was based on the score statistic and they proposed a permutation distribution for the maximum of score tests. The score statistic makes it possible to apply the scan statistic idea to models including explanatory variables. However, we show that the permutation distribution requires strong assumptions of independence between potential cluster and both censoring and explanatory variables. In contrast, we present an approach using the asymptotic distribution of the maximum of score statistics in a manner not requiring these assumptions.

benoit liquet

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Bayesian variable selection regression of multivariate responses for group data

Robust estimation procedure for autoregressive models with heterogeneity

Community evaluation of glycoproteomics informatics solutions reveals high-performance search strategies for serum glycopeptide analysis

Multi-Index Ecoacoustics Analysis for Terrestrial Soundscapes: A New Semi-Automated Approach Using Time-Series Motif Discovery and Random Forest Classification

ConvergenCeconcepts

janfer95/SR_on_SWAN: SR_on_SWAN

Type-II generalized family-wise error rate formulas with application to sample size determination

A sliced inverse regression approach for data stream

A Universal Approximate Cross-Validation Criterion for Regular Risk Functions

Age at menarche and the risk of operative delivery

Pre‐diagnostic blood immune markers, incidence and progression of B‐cell lymphoma and multiple myeloma: Univariate and functionally informed multivariate analyses

CPMCGLM: an R package for p-value adjustment when looking for an optimal transformation of a single explanatory variable in generalized linear models

Investigating trial and treatment heterogeneity in an individual patient data meta-analysis of survival data by means of the penalized maximum likelihood approach

SMOTE-CD: SMOTE for compositional data

Attributable risk estimation for adjusted disability multistate models

Bayesian meta‐analysis models for cross cancer genomic investigation of pleiotropic effects using group structure

A novel approach for biomarker selection and the integration of repeated measures experiments from two assays

Choice of prognostic estimators in joint models by estimating differences of expected conditional Kullback-Leibler risks

A new method to explicitly estimate the shift of optimum along gradients in multispecies studies

Age‐related changes in murine myometrial transcript profile are mediated by exposure to the female sex hormones

Assessment of Flooding Hazards at An Engineered Beach during Extreme Events: Biarritz, SW France

Detecting the effects of inter-annual and seasonal changes in environmental factors on the striped red mullet population in the Bay of Biscay

Estimating the expectation of the log-likelihood with censored data for estimator selection

Power and sample size determination in clinical trials with multiple primary continuous correlated endpoints

Choice between semi-parametric estimators of Markov and non-Markov multi-state models from coarsened observations

Design and rationale of the COVID-19 Critical Care Consortium international, multicentre, observational study

A multivariate approach to investigate the combined biological effects of multiple exposures

R2GUESS: A Graphics Processing Unit-BasedRPackage for Bayesian Variable Selection Regression of Multivariate Responses

Central subspaces review

In-situ measurements of energetic depth-limited wave loading

Odor vapor pressure and quality modulate local field potential oscillatory patterns in the olfactory bulb of the anesthetized rat

Pooled marginal slicing approach via SIR α with discrete covariables

Influence of the type of knee brace on clinical postoperative evolution after anterior cruciate ligament reconstructions in competitive sportspeople.,Influence du type d'orthèse de genou sur l'évolution clinique postopératoire d'une chirurgie du ligament croisé antérieur chez le sportif

Bootstrap choice of estimators in parametric and semiparametric families

Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

Correction of the P-value after multiple coding of an explanatory variable in logistic regression

Computation of the p-value of the maximum of score tests in the generalized linear model; application to multiple coding

Automatic creation of storm impact database based on video monitoring and convolutional neural networks

Age at menarche and the risk of operative delivery.

Using a supervised principal components analysis for variable selection in high-dimensional datasets reduces false discovery rates

Detecting Technical Anomalies in High-Frequency Water-Quality Data Using Artificial Neural Networks

Investigating hospital heterogeneity with a multi-state frailty model: application to nosocomial pneumonia disease in intensive care units

Choice of estimators based on different observations

Understanding links between water-quality variables and nitrate concentration in freshwater streams using high frequency sensor data

Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches

Bagging versions of sliced inverse regression

A dynamic view to moment matching of truncated distributions

Penalized Partial Least Square applied to structured data

Multi-Site and Multi-Year Remote Records of Operative Temperatures with Biomimetic Loggers Reveal Spatio-Temporal Variability in Mountain Lizard Activity and Persistence Proxy Estimates

Using Random forest and Gradient boosting trees to improve wave forecast at a specific location

Association between low-grade inflammation and Breast cancer and B-cell Myeloma and Non-Hodgkin Lymphoma: findings from two prospective cohorts

Sparse partial least squares with group and subgroup structure

A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches

Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

Community Evaluation of Glycoproteomics Informatics Solutions Reveals High-Performance Search Strategies of SerumN- andO-Glycopeptide Data

Spatial and time domain analysis of eye-tracking data during screening of brain magnetic resonance images

Leveraging pleiotropic association using sparse group variable selection in genomics data

GCPBayes pipeline: a tool for exploring pleiotropy at the gene level

[HDDA] sparse subspace constrained partial least squares

PLS for Big Data: A unified parallel algorithm for regularised group PLS

CEoptim: Cross-entropy R package for optimization

A deep learning super-resolution model to speed up computations of coastal sea states

Penalized partial least squares for pleiotropy

Mapping of Coral Reefs with Multispectral Satellites: A Review of Recent Papers

Reconstructing Missing and Anomalous Data Collected from High-Frequency In-Situ Sensors in Fresh Waters

Improving performances of MCMC for Nearest Neighbor Gaussian Process models with full data augmentation

Selection between proportional and stratified hazards models based on expected log-likelihood

Application of artificial intelligence and radiomics in pituitary neuroendocrine and sellar tumors

GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm

janfer95/SR_on_SWAN: SR_on_SWAN

Forecasting intensifying disturbance effects on coral reefs

Application of the bootstrap approach to the choice of dimension and the α parameter in the SIR α method