ARDC Research Link Australia

Publication

Marker genes of incident type 1 diabetes in peripheral blood mononuclear cells of children: A machine learning strategy for large-p, small-n scenarios

Publisher: Cold Spring Harbor Laboratory

Date: 09-02-2022

DOI: 10.1101/2022.02.07.22270652

Abstract: Type 1 diabetes (TID) is a complex, polygenic disorder, the etiology of which is not fully elucidated. Machine learning (ML) genomics could provide novel insights on disease dynamics while high-dimensionality remains a challenge. This study aimed to identify marker genes of incident T1D in peripheral blood mononuclear cells (PBMC) of children via a ML strategy attuned to high-dimensionality. Using s les from 105 children (81 with incident T1D and 24 healthy controls), we analyzed microarray transcriptomics via a workflow consisting of three sequential steps: application of dimension reduction strategies on the processed transcriptome ML on the reduced gene expression matrix and downstream network analyses to demarcate seed nodes (statistically significant genes) and hub genes. Sixteen dimension-reduction algorithms belonging to three groups (3 tailored 3 regularizations 10 classic) were applied. Four ML algorithms (multivariate adaptive regression splines, adaptive boosting, random forests, XGB-DART) were trained on the reduced feature set and internally-validated using repeated, 10-fold cross-validation. Marker genes were determined via variable importance metrics. Seed nodes were identified by the ‘ OmicsNet ’ platform while nodes having above average betweenness, closeness, and degree in the network were demarcated as hub genes. The processed gene expression matrix comprised 13515 genes which was reduced to contain 1003 genes collectively selected by dimension reduction algorithms. All four ML algorithms on this reduced feature set attained perfect and uniform predictive performance on internal validation. On removal of redundancies, variable importance metrics identified 30 marker genes of incident T1D in this cohort, while Early Growth Response 2 (EGR2) was uniformly selected by all four ML algorithms as the most important marker gene. Network analyses classified all 30 marker genes as seed nodes. Additionally, we identified 14 hub genes, 7 of which were found to be marker genes of incident T1D elucidated by ML. We identified marker genes of incident T1D in PBMC of children via a ML analytic strategy attuned to the high dimensional structure of microarrays, with downstream analyses providing high biological plausibility. The demonstrated ML strategy would be useful in analyzing other high-dimensional biomedical data for biomarker discovery.

Publication

A data-driven biocomputing pipeline with meta-analysis on high throughput transcriptomics to identify genome-wide miRNA markers associated with type 2 diabetes

Publisher: Elsevier BV

Date: 02-2022

DOI: 10.1016/J.HELIYON.2022.E08886

Publication

Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: Protocol for a systematic review and meta-analysis of predictive modeling studies

Publisher: SAGE Publications

Date: 2021

DOI: 10.1177/20552076211047390

Abstract: Machine learning involves the use of algorithms without explicit instructions. Of late, machine learning models have been widely applied for the prediction of type 2 diabetes. However, no evidence synthesis of the performance of these prediction models of type 2 diabetes is available. We aim to identify machine learning prediction models for type 2 diabetes in clinical and community care settings and determine their predictive performance. The systematic review of English language machine learning predictive modeling studies in 12 databases will be conducted. Studies predicting type 2 diabetes in predefined clinical or community settings are eligible. Standard CHARMS and TRIPOD guidelines will guide data extraction. Methodological quality will be assessed using a predefined risk of bias assessment tool. The extent of validation will be categorized by Reilly–Evans levels. Primary outcomes include model performance metrics of discrimination ability, calibration, and classification accuracy. Secondary outcomes include candidate predictors, algorithms used, level of validation, and intended use of models. The random-effects meta-analysis of c-indices will be performed to evaluate discrimination abilities. The c-indices will be pooled per prediction model, per model type, and per algorithm. Publication bias will be assessed through funnel plots and regression tests. Sensitivity analysis will be conducted to estimate the effects of study quality and missing data on primary outcome. The sources of heterogeneity will be assessed through meta-regression. Subgroup analyses will be performed for primary outcomes. No ethics approval is required, as no primary or personal data are collected. Findings will be disseminated through scientific sessions and peer-reviewed journals. CRD42019130886

Publication

Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis

Publisher: Elsevier BV

Date: 11-2020

DOI: 10.1016/J.IJMEDINF.2020.104268

Publication

Both general- and central- obesity are causally associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

Publisher: Cold Spring Harbor Laboratory

Date: 09-02-2022

DOI: 10.1101/2022.02.07.22270650

Abstract: Obesity is observed in a majority of women with polycystic ovarian syndrome (PCOS). Using body mass index (BMI) as a proxy, previous Mendelian randomization studies revealed general obesity potentially causes PCOS. Central obesity frequently demonstrates a stronger association with PCOS, although evidence on its causality is sparse. To investigate causal effects of both central- and general- obesity on the development of PCOS via two-s le Mendelian randomization (2SMR). Summary GWAS data of female-only, large-s le cohorts of European ancestry were retrieved for anthropometric markers of central obesity (waist circumference (WC), hip circumference (HC), waist-to-hip ratio (WHR)) and general obesity (BMI and its constituent variables – weight and height), from the IEU Open GWAS Project. As the outcome data, we acquired summary data from a large-s le GWAS (96391 s les 219 cases and 96172 controls) from the FinnGen cohort. Four 2SMR methods were applied: inverse variance weighted (IVW) MR Egger (MRE) weighted median (WME) weighted mode (WMO). Single SNP-, leave-one-out-, heterogeneity-, horizontal pleiotropy- and outlier- analyses were conducted. Genetic architectures underlying causal associations were explored. All SNPs selected as instrumental variables demonstrated no weak instrument bias (F 10). Three anthropometric exposures, namely, BMI (OR: 5.55 – 7.24, WC (OR: 6.79 – 24.56), and HC (OR: 6.78 – 24.56), significantly causally associated with PCOS as per IVW, WME, and WMO models. Single SNP- and leave-one-out- sensitivity analysis results were indicative of robust causal estimates. No significant heterogeneity, horizontal pleiotropy, and outliers were observed. We observed a considerable degree of overlap (7 SNPs 17 genes) across significant causal findings as well as a number of SNPs and genes that were not shared between causal associations. This study revealed that both and general- and central obesity potentially cause PCOS. Findings underscore the importance of addressing obesity and adiposity for the prevention and management of PCOS.

Publication

External validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Publisher: Cold Spring Harbor Laboratory

Date: 07-12-2021

DOI: 10.1101/2021.12.05.21267329

Abstract: The Monash early pregnancy prediction model calculates risks of developing GDM and is internationally externally validated and implemented in practice, however some gaps remain. To validate and update Monash GDM model, revising ethnicity categorisation, updating to recent diagnostic criteria, to improve performance and generalisability. Routine health data for singleton pregnancies from 2016 to 2018 in Australia included updated GDM diagnostic criteria. The Original Model predictors were included (age, body mass index, ethnicity, diabetes family history, past-history of GDM, past-history of poor obstetric outcomes, ethnicity), with ethnicity revised. Updating model methods were: recalibration-in-the-large (Model A) re-estimation of intercept and slope (Model B), and coefficients revision using logistic regression (Mode1 C1 with original eight ethnicity categories, and Mode1 C2 with updated 6 ethnicity categories). Analysis included ten-fold cross-validation, performance measures (c-statistic, calibration-in-the-large value, calibration slope and expected-observed (E:O) ratio) and closed testing examining log-likelihood scores and AIC compared models. In 26,474 singleton pregnancies (4,756, 18% with GDM), we showed that temporal validation of the original model was reasonable ( c -statistic 0.698) but with suboptimal calibration (E:O of 0.485). Model C2 was preferred, because of the high c-statistic (0.732), and it performed significantly better in closed testing compared to other models. Updating of the original model sustains predictive performance in a contemporary population, including ethnicity data, recent diagnostic criteria, and universal screening context. This supports the value of risk prediction models to guide risk-stratified care to women at risk of GDM. This study was registered as part of the PeRSonal GDM study on the Australian and New Zealand Clinical Trials Registry (ACTRN12620000915954) Pre-results.

Publication

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: A bioinformatics analytic workflow

Publisher: Cold Spring Harbor Laboratory

Date: 10-02-2022

DOI: 10.1101/2022.02.07.479483

Abstract: Type 2 diabetes (T2D) has a complex etiology which is not fully elucidated. Identification of gene perturbations and hub genes of T2D may assist in personalizing care. We aimed to identify highly perturbed genes and hub genes associated with T2D in different tissues of adult humans via an extensive workflow. Workflow comprised five sequential steps: systematic review of NCBI GEO database identification and classification of differentially expressed genes (DEG) identification of highly perturbed genes via meta-analysis identification of hub genes via network analysis downstream analyses. Three meta-analytic strategies: random effects model (REM) vote counting approach (VC) p -value combining approach (CA), were applied. Nodes having above average betweenness, closeness, and degree in the network were defined as hub genes. Downstream analyses included gene ontologies, Kyoto Encyclopedia of Genes and Genomes pathways, metabolomics, COVID-19 related genes, and Genotype-Tissue Expression profiles. Analysis of 27 eligible microarrays identified 6284 DEG (4592 down-regulated and 1692 up-regulated) within four tissue types. Tissue-specific gene expression was significantly greater than tissue non-specific (shared) gene expression. Meta-analysis of DEG identified 49, 27, and 8 highly perturbed genes via REM, VC, and CA, respectively, producing a compiled set of 79 highly perturbed (41 down-regulated and 38 up-regulated) genes. The 28 hub genes comprised 13 up-regulated, 9 down-regulated, and 6 predicted genes. Downstream analyses identified enrichments of: shared genes with other diabetes phenotypes insulin synthesis and action related pathways and metabolomics mechanistic associations with apoptosis and immunity-related pathways, COVID-19 related gene sets and cell types demonstrating over- and under-expression of marker genes of T2D. We identified highly perturbed genes and hub genes of T2D and revealed their associations with other diabetes phenotypes and COVID-19 as well as pathophysiological manifestations such as those related to insulin, immunity, and apoptosis. Broader utility of the proposed pipeline is envisaged.

Publication

A combined strategy of feature selection and machine learning to identify predictors of prediabetes

Publisher: Oxford University Press (OUP)

Date: 30-12-2020

DOI: 10.1093/JAMIA/OCZ204

Abstract: To identify predictors of prediabetes using feature selection and machine learning on a nationally representative s le of the US population. We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013–2014. Prediabetes was defined using American Diabetes Association guidelines. The s le was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and res led training datasets built using 4 res ling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011–2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance. Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P & 0.05). Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified. This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

Publication

Causality of anthropometric markers associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

Publisher: Public Library of Science (PLoS)

Date: 09-06-2022

DOI: 10.1371/JOURNAL.PONE.0269191

Abstract: Using body mass index (BMI) as a proxy, previous Mendelian randomization (MR) studies found total causal effects of general obesity on polycystic ovarian syndrome (PCOS). Hitherto, total and direct causal effects of general- and central obesity on PCOS have not been comprehensively analyzed. To investigate the causality of central- and general obesity on PCOS using surrogate anthropometric markers. Summary GWAS data of female-only, large-s le cohorts of European ancestry were retrieved for anthropometric markers of central obesity (waist circumference (WC), hip circumference (HC), waist-to-hip ratio (WHR)) and general obesity (BMI and its constituent variables–weight and height), from the IEU Open GWAS Project. As the outcome, we acquired summary data from a large-s le GWAS (118870 s les 642 cases and 118228 controls) within the FinnGen cohort. Total causal effects were assessed via univariable two-s le Mendelian randomization (2SMR). Genetic architectures underlying causal associations were explored. Direct causal effects were analyzed by multivariable MR modelling. Instrumental variables demonstrated no weak instrument bias (F 10). Four anthropometric exposures, namely, weight (2.69–77.05), BMI (OR: 2.90–4.06), WC (OR: 6.22–20.27), and HC (OR: 6.22–20.27) demonstrated total causal effects as per univariable 2SMR models. We uncovered shared and non-shared genetic architectures underlying causal associations. Direct causal effects of WC and HC on PCOS were revealed by two multivariable MR models containing exclusively the anthropometric markers of central obesity. Other multivariable MR models containing anthropometric markers of both central- and general obesity showed no direct causal effects on PCOS. Both and general- and central obesity yield total causal effects on PCOS. Findings also indicated potential direct causal effects of normal weight-central obesity and more complex causal mechanisms when both central- and general obesity are present. Results underscore the importance of addressing both central- and general obesity for optimizing PCOS care.

Publication

Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data

Publisher: Elsevier BV

Date: 05-2021

DOI: 10.1016/J.COMPBIOMED.2021.104305

Publication

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: a bioinformatics analytic workflow

Publisher: Springer Science and Business Media LLC

Date: 05-07-2022

DOI: 10.1007/S10142-022-00881-5

Abstract: Type 2 diabetes (T2D) has a complex etiology which is not yet fully elucidated. The identification of gene perturbations and hub genes of T2D may deepen our understanding of its genetic basis. We aimed to identify highly perturbed genes and hub genes associated with T2D via an extensive bioinformatics analytic workflow consisting of five steps: systematic review of Gene Expression Omnibus and associated literature identification and classification of differentially expressed genes (DEGs) identification of highly perturbed genes via meta-analysis identification of hub genes via network analysis and downstream analysis of highly perturbed genes and hub genes. Three meta-analytic strategies, random effects model, vote-counting approach, and p value combining approach, were applied. Hub genes were defined as those nodes having above-average betweenness, closeness, and degree in the network. Downstream analyses included gene ontologies, Kyoto Encyclopedia of Genes and Genomes pathways, metabolomics, COVID-19-related gene sets, and Genotype-Tissue Expression profiles. Analysis of 27 eligible microarrays identified 6284 DEGs (4592 downregulated and 1692 upregulated) in four tissue types. Tissue-specific gene expression was significantly greater than tissue non-specific (shared) gene expression. Analyses revealed 79 highly perturbed genes and 28 hub genes. Downstream analyses identified enrichments of shared genes with certain other diabetes phenotypes insulin synthesis and action-related pathways and metabolomics mechanistic associations with apoptosis and immunity-related pathways COVID-19-related gene sets and cell types demonstrating over- and under-expression of marker genes of T2D. Our approach provided valuable insights on T2D pathogenesis and pathophysiological manifestations. Broader utility of this pipeline beyond T2D is envisaged.

Publication

Nutritional markers of undiagnosed type 2 diabetes in adults: Findings of a machine learning analysis with external validation and benchmarking

Publisher: Public Library of Science (PLoS)

Date: 05-05-2021

DOI: 10.1371/JOURNAL.PONE.0250832

Abstract: Using a nationally-representative, cross-sectional cohort, we examined nutritional markers of undiagnosed type 2 diabetes in adults via machine learning. A total of 16429 men and non-pregnant women ≥ 20 years of age were analysed from five consecutive cycles of the National Health and Nutrition Examination Survey. Cohorts from years 2013–2016 (n = 6673) was used for external validation. Undiagnosed type 2 diabetes was determined by a negative response to the question “Have you ever been told by a doctor that you have diabetes?” and a positive glycaemic response to one or more of the three diagnostic tests (HbA1c 6.4% or FPG mg/dl or 2-hr post-OGTT glucose 200mg/dl). Following comprehensive literature search, 114 potential nutritional markers were modelled with 13 behavioural and 12 socio-economic variables. We tested three machine learning algorithms on original and res led training datasets built using three res ling methods. From this, the derived 12 predictive models were validated on internal- and external validation cohorts. Magnitudes of associations were gauged through odds ratios in logistic models and variable importance in others. Models were benchmarked against the ADA diabetes risk test. The prevalence of undiagnosed type 2 diabetes was 5.26%. Four best-performing models (AUROC range: 74.9%-75.7%) classified 39 markers of undiagnosed type 2 diabetes 28 via one or more of the three best-performing non-linear/ensemble models and 11 uniquely by the logistic model. They comprised 14 nutrient-based, 12 anthropometry-based, 9 socio-behavioural, and 4 diet-associated markers. AUROC of all models were on a par with ADA diabetes risk test on both internal and external validation cohorts ( p .05). Models performed comparably to the chosen benchmark. Novel behavioural markers such as the number of meals not prepared from home were revealed. This approach may be useful in nutritional epidemiology to unravel new associations with type 2 diabetes.

Publication

Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model

Publisher: Elsevier BV

Date: 11-2023

DOI: 10.1016/J.IJMEDINF.2023.105228

Publication

Temporal validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Publisher: Elsevier BV

Date: 09-2023

DOI: 10.1016/J.JCLINEPI.2023.08.020

Kushan De Silva

Researcher

Publications

Marker genes of incident type 1 diabetes in peripheral blood mononuclear cells of children: A machine learning strategy for large-p, small-n scenarios

A data-driven biocomputing pipeline with meta-analysis on high throughput transcriptomics to identify genome-wide miRNA markers associated with type 2 diabetes

Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: Protocol for a systematic review and meta-analysis of predictive modeling studies

Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis

Both general- and central- obesity are causally associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

External validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: A bioinformatics analytic workflow

A combined strategy of feature selection and machine learning to identify predictors of prediabetes

Causality of anthropometric markers associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: a bioinformatics analytic workflow

Nutritional markers of undiagnosed type 2 diabetes in adults: Findings of a machine learning analysis with external validation and benchmarking

Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model

Temporal validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Related Organisations

Monash University

Umeå University

University Of Peradeniya Faculty Of Dental Sciences

Related Funding Activities

Kushan De Silva

Researcher

Related Links

Publications

Marker genes of incident type 1 diabetes in peripheral blood mononuclear cells of children: A machine learning strategy for large-p, small-n scenarios

A data-driven biocomputing pipeline with meta-analysis on high throughput transcriptomics to identify genome-wide miRNA markers associated with type 2 diabetes

Use and performance of machine learning models for type 2 diabetes prediction in clinical and community care settings: Protocol for a systematic review and meta-analysis of predictive modeling studies

Use and performance of machine learning models for type 2 diabetes prediction in community settings: A systematic review and meta-analysis

Both general- and central- obesity are causally associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

External validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: A bioinformatics analytic workflow

A combined strategy of feature selection and machine learning to identify predictors of prediabetes

Causality of anthropometric markers associated with polycystic ovarian syndrome: Findings of a Mendelian randomization study

Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data

Highly perturbed genes and hub genes associated with type 2 diabetes in different tissues of adult humans: a bioinformatics analytic workflow

Nutritional markers of undiagnosed type 2 diabetes in adults: Findings of a machine learning analysis with external validation and benchmarking

Comparison of machine learning and conventional logistic regression-based prediction models for gestational diabetes in an ethnically diverse population; the Monash GDM Machine learning model

Temporal validation and updating of a prediction model for the diagnosis of gestational diabetes mellitus

Related Organisations

Monash University

Umeå University

University Of Peradeniya Faculty Of Dental Sciences

Related Funding Activities

ARDC NEWSLETTER SIGNUP