ORCID Profile
0000-0001-9413-6520
Current Organisations
Laboratoire de Chimie
,
The Alan Turing Institute
,
University of Cambridge
,
Baker Heart and Diabetes Institute
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Stochastic Analysis And Modelling | Biochemistry and cell biology | Systems biology | Immunology | Gene Expression | Cellular Immunology |
Immune system and allergy | Infectious diseases | Treatments (e.g. chemicals, antibiotics) |
Publisher: Springer Science and Business Media LLC
Date: 04-01-2021
DOI: 10.1038/S41467-020-20434-3
Abstract: The effective storage of lipids in white adipose tissue (WAT) critically impacts whole body energy homeostasis. Many genes have been implicated in WAT lipid metabolism, including tripartite motif containing 28 ( Trim28 ), a gene proposed to primarily influence adiposity via epigenetic mechanisms in embryonic development. However, in the current study we demonstrate that mice with deletion of Trim28 specifically in committed adipocytes, also develop obesity similar to global Trim28 deletion models, highlighting a post-developmental role for Trim28. These effects were exacerbated in female mice, contributing to the growing notion that Trim28 is a sex-specific regulator of obesity. Mechanistically, this phenotype involves alterations in lipolysis and triglyceride metabolism, explained in part by loss of Klf14 expression, a gene previously demonstrated to modulate adipocyte size and body composition in a sex-specific manner. Thus, these findings provide evidence that Trim28 is a bona fide, sex specific regulator of post-developmental adiposity and WAT function.
Publisher: EMBO
Date: 2010
DOI: 10.1038/MSB.2010.93
Abstract: Comprehensive characterization of human tissues promises novel insights into the biological architecture of human diseases and traits. We assessed metabonomic, transcriptomic, and genomic variation for a large population‐based cohort from the capital region of Finland. Network analyses identified a set of highly correlated genes, the lipid–leukocyte (LL) module, as having a prominent role in over 80 serum metabolites (of 134 measures quantified), including lipoprotein subclasses, lipids, and amino acids. Concurrent association with immune response markers suggested the LL module as a possible link between inflammation, metabolism, and adiposity. Further, genomic variation was used to generate a directed network and infer LL module's largely reactive nature to metabolites. Finally, gene co‐expression in circulating leukocytes was shown to be dependent on serum metabolite concentrations, providing evidence for the hypothesis that the coherence of molecular networks themselves is conditional on environmental factors. These findings show the importance and opportunity of systematic molecular investigation of human population s les. To facilitate and encourage this investigation, the metabonomic, transcriptomic, and genomic data used in this study have been made available as a resource for the research community.
Publisher: Elsevier BV
Date: 02-2008
Publisher: Elsevier BV
Date: 10-2015
DOI: 10.1016/J.CELS.2015.09.007
Abstract: The biomarker glycoprotein acetylation (GlycA) has been shown to predict risk of cardiovascular disease and all-cause mortality. Here, we characterize biological processes associated with GlycA by leveraging population-based omics data and health records from >10,000 in iduals. Our analyses show that GlycA levels are chronic within in iduals for up to a decade. In apparently healthy in iduals, elevated GlycA corresponded to elevation of myriad inflammatory cytokines, as well as a gene coexpression network indicative of increased neutrophil activity, suggesting that in iduals with high GlycA may be in a state of chronic inflammatory response. Accordingly, analysis of infection-related hospitalization and death records showed that increased GlycA increased long-term risk of severe non-localized and respiratory infections, particularly septicaemia and pneumonia. In total, our work demonstrates that GlycA is a biomarker for chronic inflammation, neutrophil activity, and risk of future severe infection. It also illustrates the utility of leveraging multi-layered omics data and health records to elucidate the molecular and cellular processes associated with biomarkers.
Publisher: Elsevier BV
Date: 09-2018
Publisher: Springer Science and Business Media LLC
Date: 24-09-2020
DOI: 10.1186/S12864-020-07019-6
Abstract: Horizontal gene transfer contributes to bacterial evolution through mobilising genes across various taxonomical boundaries. It is frequently mediated by mobile genetic elements (MGEs), which may capture, maintain, and rearrange mobile genes and co-mobilise them between bacteria, causing horizontal gene co-transfer (HGcoT). This physical linkage between mobile genes poses a great threat to public health as it facilitates dissemination and co-selection of clinically important genes amongst bacteria. Although rapid accumulation of bacterial whole-genome sequencing data since the 2000s enables study of HGcoT at the population level, results based on genetic co-occurrence counts and simple association tests are usually confounded by bacterial population structure when s led bacteria belong to the same species, leading to spurious conclusions. We have developed a network approach to explore WGS data for evidence of intraspecies HGcoT and have implemented it in R package GeneMates ( anyuac/GeneMates ). The package takes as input an allelic presence-absence matrix of interested genes and a matrix of core-genome single-nucleotide polymorphisms, performs association tests with linear mixed models controlled for population structure, produces a network of significantly associated alleles, and identifies clusters within the network as plausible co-transferred alleles. GeneMates users may choose to score consistency of allelic physical distances measured in genome assemblies using a novel approach we have developed and overlay scores to the network for further evidence of HGcoT. Validation studies of GeneMates on known acquired antimicrobial resistance genes in Escherichia coli and Salmonella Typhimurium show advantages of our network approach over simple association analysis: (1) distinguishing between allelic co-occurrence driven by HGcoT and that driven by clonal reproduction, (2) evaluating effects of population structure on allelic co-occurrence, and (3) direct links between allele clusters in the network and MGEs when physical distances are incorporated. GeneMates offers an effective approach to detection of intraspecies HGcoT using WGS data.
Publisher: Elsevier BV
Date: 06-2019
Publisher: Cold Spring Harbor Laboratory
Date: 03-09-2021
Abstract: The number of publicly available microbiome s les is continually growing. As data set size increases, bottlenecks arise in standard analytical pipelines. Faith's phylogenetic ersity (Faith's PD) is a highly utilized phylogenetic alpha ersity metric that has thus far failed to effectively scale to trees with millions of vertices. Stacked Faith's phylogenetic ersity (SFPhD) enables calculation of this widely adopted ersity metric at a much larger scale by implementing a computationally efficient algorithm. The algorithm reduces the amount of computational resources required, resulting in more accessible software with a reduced carbon footprint, as compared to previous approaches. The new algorithm produces identical results to the previous method. We further demonstrate that the phylogenetic aspect of Faith's PD provides increased power in detecting ersity differences between younger and older populations in the FINRISK study's metagenomic data.
Publisher: Springer Science and Business Media LLC
Date: 2013
DOI: 10.1186/GM418
Publisher: Public Library of Science (PLoS)
Date: 11-01-2013
Publisher: Cold Spring Harbor Laboratory
Date: 13-09-2020
DOI: 10.1101/2020.09.12.20193045
Abstract: Co-evolution between humans and the microbial communities colonizing them has resulted in an intimate assembly of thousands of microbial species mutualistically living on and in their body and impacting multiple aspects of host physiology and health. Several studies examining whether human genetic variation can affect gut microbiota suggest a complex combination of environmental and host factors. Here, we leverage a single large-scale population-based cohort of 5,959 genotyped in iduals with matched gut microbial shotgun metagenomes, dietary information and health records up to 16 years post-s ling, to characterize human genetic variations associated with microbial abundances, and predict possible causal links with various diseases using Mendelian randomization (MR). Genome-wide association study (GWAS) identified 583 independent SNP-taxon associations at genome-wide significance ( p .0×10 -8 ), which included notable strong associations with LCT ( p =5.02×10 -35 ), ABO ( p =1.1×10 -12 ), and MED13L ( p =1.84×10 -12 ). A combination of genetics and dietary habits was shown to strongly shape the abundances of certain key bacterial members of the gut microbiota, and explain their genetic association. Genetic effects from the LCT locus on Bifidobacterium and three other associated taxa significantly differed according to dairy intake. Variation in mucin-degrading Faecalicatena lactaris abundances were associated with ABO , highlighting a preferential utilization of secreted A/B/AB-antigens as energy source in the gut, irrespectively of fibre intake. Enterococcus faecalis levels showed a robust association with a variant in MED13L , with putative links to colorectal cancer. Finally, we identified putative causal relationships between gut microbes and complex diseases using MR, with a predicted effect of Morganella on major depressive disorder that was consistent with observational incident disease analysis. Overall, we present striking ex les of the intricate relationship between humans and their gut microbial communities, and highlight important health implications.
Publisher: Springer International Publishing
Date: 2018
DOI: 10.1007/978-3-319-77932-4_38
Abstract: Phenotypic sex differences in coronary artery disease (CAD) and its risk factors have been apparent for many decades in basic and clinical research however, whether these are also present at the gene level and thus influence genome-wide association and genetic risk prediction studies has often been ignored. From fundamental and medical standpoints, this is critically important to assess in order to fully understand the underlying genetic architecture that predisposes to CAD and better predict disease outcomes based on the interaction between genes, sex effects, and environment. In this chapter we aimed to (1) integrate the history and latest research from genome-wide association studies for CAD and clinical and genetic risk scores for prediction of CAD, (2) highlight sex-specific differences in these areas of research, and (3) discuss reasons why sex differences have often not been considered and, where present, why sex differences exist at genetic and phenotypic levels and how important they are for consideration in future research. While we find interesting ex les of sex differences in effects of genetic variants on CAD, genome-wide association and genetic risk studies have typically not tested for sex-specific effects despite mounting evidence from erse fields that these are likely very important to consider at both the genetic and phenotypic levels. In-depth testing for sex effects in large-scale genome-wide association studies that include autosomal and often excluded sex chromosomes alongside parallel improvements in resolution of sex-specific differences for risk factors and disease outcomes for CAD has the potential to substantially improve clinical and genetic risk prediction studies. Developing sex-tailored genetic risk scores as has been done recently for other disorders might be also warranted for CAD. In the era of precision medicine, this level of accuracy is essential for such a common and costly disease.
Publisher: Cold Spring Harbor Laboratory
Date: 26-04-2018
DOI: 10.1101/309138
Abstract: Integration of electronic health records with systems-level biomolecular data has led to the discovery that GlycA, a complex nuclear magnetic resonance (NMR) spectroscopy biomarker, predicts long-term risk of disease onset and death from myriad causes. To determine the molecular underpinnings of the disease risk of the heterogeneous GlycA signal, we used machine learning to build imputation models for GlycA’s constituent glycoproteins, then estimated glycoprotein levels in 11,861 adults across two population-based cohorts with long-term follow-up. While alpha-1-acid glycoprotein had the strongest correlation with GlycA, our analysis revealed that alpha-1 antitrypsin (AAT) was the most predictive of morbidity and mortality for the widest range of diseases, including heart failure (HR=1.60 per s.d., P=1×10 −10 ), influenza and pneumonia (HR=1.37, P=6×10 −10 ), and liver diseases (HR=1.81, P=1×10 −6 ). Despite emerging evidence of AAT's role in suppressing inflammation, transcriptional analyses revealed elevated expression of erse inflammatory immune pathways with elevated AAT levels, suggesting AAT is elevating to compensate for low-grade chronic inflammation. This study clarifies the molecular underpinnings of the GlycA biomarker and its associated disease risk, and indicates a previously unrecognised association between elevated AAT and severe disease onset and mortality.
Publisher: Elsevier BV
Date: 08-2015
DOI: 10.1016/J.GDE.2015.06.005
Abstract: Recent advances in genome-wide association studies have stimulated interest in the genomic prediction of disease risk, potentially enabling in idual-level risk estimates for early intervention and improved diagnostic procedures. Here, we review recent findings and approaches to genomic prediction model construction and performance, then contrast the potential benefits of such models in two complex human diseases, aiding diagnosis in celiac disease and prospective risk prediction for cardiovascular disease. Early indications are that optimal application of genomic risk scores will differ substantially for each disease depending on underlying genetic architecture as well as current clinical and public health practice. As costs decline, genomic profiles become common, and popular understanding of risk and its communication improves, genomic risk will become increasingly useful for the in idual and the clinician.
Publisher: Springer Science and Business Media LLC
Date: 14-11-2022
Publisher: IEEE
Date: 08-2011
Publisher: European Respiratory Society (ERS)
Date: 16-10-2019
DOI: 10.1183/13993003.00844-2019
Abstract: Asthma is a common condition caused by immune and respiratory dysfunction, and it is often linked to allergy. A systems perspective may prove helpful in unravelling the complexity of asthma and allergy. Our aim is to give an overview of systems biology approaches used in allergy and asthma research. Specifically, we describe recent “omic”-level findings, and examine how these findings have been systematically integrated to generate further insight. Current research suggests that allergy is driven by genetic and epigenetic factors, in concert with environmental factors such as microbiome and diet, leading to early-life disturbance in immunological development and disruption of balance within key immuno-inflammatory pathways. Variation in inherited susceptibility and exposures causes heterogeneity in manifestations of asthma and other allergic diseases. Machine learning approaches are being used to explore this heterogeneity, and to probe the pathophysiological patterns or “endotypes” that correlate with subphenotypes of asthma and allergy. Mathematical models are being built based on genomic, transcriptomic and proteomic data to predict or discriminate disease phenotypes, and to describe the biomolecular networks behind asthma. The use of systems biology in allergy and asthma research is rapidly growing, and has so far yielded fruitful results. However, the scale and multidisciplinary nature of this research means that it is accompanied by new challenges. Ultimately, it is hoped that systems medicine, with its integration of omics data into clinical practice, can pave the way to more precise, personalised and effective management of asthma.
Publisher: Oxford University Press (OUP)
Date: 21-09-2016
Publisher: Public Library of Science (PLoS)
Date: 24-08-2018
Publisher: American Society for Microbiology
Date: 26-04-2022
DOI: 10.1128/MSYSTEMS.00167-22
Abstract: Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene licon sequencing for decoding the composition and structure of microbial communities. Current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution.
Publisher: Elsevier BV
Date: 07-2022
DOI: 10.1016/J.JACC.2022.05.015
Abstract: Risk factor-based models fail to accurately estimate risk in select populations, in particular younger in iduals. A sizable number of people are also classified as being at intermediate risk, for whom the optimal preventive strategy could be more precise. Several personalized risk prediction tools, including coronary artery calcium scoring, polygenic risk scores, and metabolic risk scores may be able to improve risk assessment, pending supportive outcome data from clinical trials. Other tools may well emerge in the near future. A multidimensional approach to risk prediction holds the promise of precise risk prediction. This could allow for targeted prevention minimizing unnecessary costs and risks while maximizing benefits. High-risk in iduals could also be identified early in life, creating opportunities to arrest the development of nascent coronary atherosclerosis and prevent future clinical events.
Publisher: Public Library of Science (PLoS)
Date: 20-09-2021
Publisher: BMJ
Date: 08-06-2009
Publisher: Oxford University Press (OUP)
Date: 07-03-2013
DOI: 10.1093/HMG/DDT116
Publisher: Cold Spring Harbor Laboratory
Date: 20-11-2017
DOI: 10.1101/222190
Abstract: Repeated cycles of infection-associated lower airway inflammation drives the pathogenesis of persistent wheezing disease in children. Tracking these events across a birth cohort during their first five years, we demonstrate that % of infectious events indeed involve viral pathogens, but are accompanied by a shift in the nasopharyngeal microbiome (NPM) towards dominance by a small range of pathogenic bacterial genera. Unexpectedly, this change in NPM frequently precedes the appearance of viral pathogens and acute symptoms. In non-sensitized children these events are associated only with “transient wheeze” that resolves after age three. In contrast, in children developing early allergic sensitization, they are associated with ensuing development of persistent wheeze, which is the hallmark of the asthma phenotype. This suggests underlying pathogenic interactions between allergic sensitization and antibacterial mechanisms.
Publisher: Elsevier BV
Date: 05-2022
Publisher: Frontiers Media SA
Date: 23-09-2014
Publisher: Springer Science and Business Media LLC
Date: 07-06-2007
DOI: 10.1038/NATURE05911
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 08-2010
DOI: 10.1161/CIRCGENETICS.109.906180
Abstract: Evidence is sparse about the genetic determinants of major lipids in Pakistanis. Variants (n=45 000) across 2000 genes were assessed in 3200 Pakistanis and compared with 2450 Germans using the same gene array and similar lipid assays. We also did a meta-analysis of selected lipid-related variants in Europeans. Pakistani genetic architecture was distinct from that of several ethnic groups represented in international reference s les. Forty-one variants at 14 loci were significantly associated with levels of HDL-C, triglyceride, or LDL-C. The most significant lipid-related variants identified among Pakistanis corresponded to genes previously shown to be relevant to Europeans, such as CETP associated with HDL-C levels (rs711752 P −13 ), APOA5/ZNF259 (rs651821 P −13 ) and GCKR (rs1260326 P −13 ) with triglyceride levels and CELSR2 variants with LDL-C levels (rs646776 P −9 ). For Pakistanis, these 41 variants explained 6.2%, 7.1%, and 0.9% of the variation in HDL-C, triglyceride, and LDL-C, respectively. Compared with Europeans, the allele frequency of rs662799 in APOA5 among Pakistanis was higher and its impact on triglyceride concentration was greater ( P -value for difference −4 ). Several lipid-related genetic variants are common to Pakistanis and Europeans, though they explain only a modest proportion of population variation in lipid concentration. Allelic frequencies and effect sizes of lipid-related variants can differ between Pakistanis and Europeans.
Publisher: Cold Spring Harbor Laboratory
Date: 08-05-2020
DOI: 10.1101/2020.04.23.20077099
Abstract: Polygenic risk scores (PRS), often aggregating the results from genome-wide association studies, can bridge the gap between the initial variant discovery efforts and disease risk estimation for clinical applications. However, there is remarkable heterogeneity in the reporting of these risk scores due to a lack of adherence to reporting standards and no accepted standards suited for the current state of PRS development and application. This lack of adherence and best practices hinders the translation of PRS into clinical care. The ClinGen Complex Disease Working Group, in a collaboration with the Polygenic Score (PGS) Catalog, have developed a novel PRS Reporting Statement (PRS-RS), updating previous standards to the current state of the field and to enable downstream utility. Drawing upon experts in epidemiology, statistics, disease-specific applications, implementation, and policy, this 23-item reporting framework defines the minimal information needed to interpret and evaluate a PRS, especially with respect to any downstream clinical applications. Items span detailed descriptions of the study population (recruitment method, key demographics, inclusion/exclusion criteria, and phenotype definition), statistical methods for both PRS development and validation, and considerations for potential limitations of the published risk score and downstream clinical utility. Additionally, emphasis has been placed on data availability and transparency to facilitate reproducibility and benchmarking against other PRS, such as deposition in the publicly available PGS Catalog ( www.PGScatalog.org ). By providing these criteria in a structured format that builds upon existing standards and ontologies, the use of this framework in publishing PRS will facilitate translation of PRS into clinical care and progress towards defining best practices. In recent years, polygenic risk scores (PRS) have become an increasingly studied tool to capture the genome-wide liability underlying many human traits and diseases, hoping to better inform an in idual’s genetic risk. However, a lack of tailored reporting standards has hindered the translation of this important tool into clinical and public health practice with the heterogeneous underreporting of details necessary for benchmarking and reproducibility. To address this gap, the ClinGen Complex Disease Working Group and Polygenic Score (PGS) Catalog have collaborated to develop the 23-item Polygenic Risk Score Reporting Statement (PRS-RS). This framework provides the minimal information expected of authors to promote the validity, transparency, and reproducibility of PRS by requiring authors to detail the study population, statistical methods, and potential clinical utility of a published score. The widespread adoption of this framework will encourage rigorous methodological consideration and facilitate benchmarking to ensure high quality scores are translated into the clinic.
Publisher: Cold Spring Harbor Laboratory
Date: 09-10-2023
Publisher: Research Square Platform LLC
Date: 04-01-2022
DOI: 10.21203/RS.3.RS-1175817/V1
Abstract: Previous genome-wide association studies (GWAS) of stroke, the second leading cause of death, have been conducted in populations of predominantly European ancestry.1,2 We undertook cross-ancestry GWAS meta-analyses of stroke and its subtypes in 110,182 stroke patients (33% non-European) and 1,503,898 control in iduals of five ancestries from population- and clinic-based studies, nearly doubling the number of cases in previous stroke GWAS. We identified association signals at 89 independent loci, of which 61 were novel. Effect sizes were overall highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis using a novel machine-learning approach,3 transcriptome and proteome-wide association analyses revealed putative causal genes (e.g. SH3PXD2A and FURIN) and variants (e.g. at GRK5 and NOS3). Using a novel three-pronged approach,4 we provided genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWAS with vascular risk factor GWAS (iPGS) showed strong prediction of ischemic stroke risk in European and, for the first time, East-Asian populations.5,6 The iPGS performed better than stroke PGS alone and better than previous best iPGS, in Europeans and East-Asians. Transferability of European-specific iPGS to East-Asians was limited. Stroke genetic risk scores were predictive of ischemic stroke independent of clinical risk factors in 52,600 clinical trial participants with cardiometabolic disease and performed considerably better than previous scores, both in Europeans and East-Asians. Altogether our results provide critical insight to inform biology, reveal potential drug targets for intervention, and provide genetic risk prediction tools across ancestries for targeted prevention.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 06-10-2020
Abstract: Epidemiological and animal studies have associated systemic inflammation with blood pressure (BP). However, the mechanistic factors linking inflammation and BP remain unknown. Fatty acid–derived eicosanoids serve as mediators of inflammation and have been suggested to regulate renal vascular tone, peripheral resistance, renin‐angiotensin system, and endothelial function. We hypothesize that specific proinflammatory and anti‐inflammatory eicosanoids are linked with BP. We studied a population s le of 8099 FINRISK 2002 participants randomly drawn from the Finnish population register (53% women mean age, 48±13 years) and, for external validation, a s le of 2859 FHS (Framingham Heart Study) Offspring study participants (55% women mean age, 66±9 years). Using nontargeted liquid chromatography–mass spectrometry, we profiled 545 distinct high‐quality eicosanoids and related oxylipin mediators in plasma. Adjusting for conventional hypertension risk factors, we observed 187 (34%) metabolites that were significantly associated with systolic BP ( P Bonferroni‐corrected threshold of 0.05/545). We used forward selection linear regression modeling in FINRISK to define a general formula for in idual eicosanoid risk score. In iduals of the top risk score quartile in FINRISK had a 9.0 (95% CI, 8.0–10.1) mm Hg higher systolic BP compared with in iduals in the lowest quartile in fully adjusted models. Observed metabolite associations were consistent across FINRISK and FHS. Plasma eicosanoids demonstrate strong associations with BP in the general population. As eicosanoid compounds affect numerous physiological processes that are central to BP regulation, they may offer new insights about the pathogenesis of hypertension, as well as serve as potential targets for therapeutic intervention.
Publisher: Cold Spring Harbor Laboratory
Date: 08-02-2019
DOI: 10.1101/544445
Abstract: Cytokines are essential regulatory components of the immune system and their aberrant levels have been linked to many disease states. Despite increasing evidence that cytokines operate in concert, many of the physiological interactions between cytokines, and the shared genetic architecture that underlie them, remain unknown. Here we aimed to identify and characterise genetic variants with pleiotropic effects on cytokines – to do this we performed a multivariate genome-wide association study on a correlation network of 11 circulating cytokines measured in 9,263 in iduals. Meta-analysis identified a total of 8 loci significantly associated with the cytokine network, of which two ( PDGFRB and ABO ) had not been detected previously. Bayesian colocalisation analysis revealed shared causal variants between the eight cytokine loci and other traits in particular, cytokine network variants at the ABO, SERPINE2 , and ZFPM2 loci showed pleiotropic effects on the production of immune-related proteins on metabolic traits such as lipoprotein and lipid levels on blood-cell related traits such as platelet count and on disease traits such as coronary artery disease and type 2 diabetes.
Publisher: Springer Science and Business Media LLC
Date: 29-11-2022
DOI: 10.1038/S41467-022-35017-7
Abstract: Understanding how genetic variants influence disease risk and complex traits (variant-to-function) is one of the major challenges in human genetics. Here we present a model-driven framework to leverage human genome-scale metabolic networks to define how genetic variants affect biochemical reaction fluxes across major human tissues, including skeletal muscle, adipose, liver, brain and heart. As proof of concept, we build personalised organ-specific metabolic flux models for 524,615 in iduals of the INTERVAL and UK Biobank cohorts and perform a fluxome-wide association study (FWAS) to identify 4312 associations between personalised flux values and the concentration of metabolites in blood. Furthermore, we apply FWAS to identify 92 metabolic fluxes associated with the risk of developing coronary artery disease, many of which are linked to processes previously described to play in role in the disease. Our work demonstrates that genetically personalised metabolic models can elucidate the downstream effects of genetic variants on biochemical reactions involved in common human diseases.
Publisher: Springer Science and Business Media LLC
Date: 06-06-2022
DOI: 10.1038/S41467-022-30875-7
Abstract: We integrated lipidomics and genomics to unravel the genetic architecture of lipid metabolism and identify genetic variants associated with lipid species putatively in the mechanistic pathway for coronary artery disease (CAD). We quantified 596 lipid species in serum from 4,492 in iduals from the Busselton Health Study. The discovery GWAS identified 3,361 independent lipid-loci associations, involving 667 genomic regions (479 previously unreported), with validation in two independent cohorts. A meta-analysis revealed an additional 70 independent genomic regions associated with lipid species. We identified 134 lipid endophenotypes for CAD associated with 186 genomic loci. Associations between independent lipid-loci with coronary atherosclerosis were assessed in ∼456,000 in iduals from the UK Biobank. Of the 53 lipid-loci that showed evidence of association ( P 1 × 10 −3 ), 43 loci were associated with at least one lipid endophenotype. These findings illustrate the value of integrative biology to investigate the aetiology of atherosclerosis and CAD, with implications for other complex diseases.
Publisher: Oxford University Press (OUP)
Date: 26-04-2016
DOI: 10.1093/IJE/DYW046
Publisher: Public Library of Science (PLoS)
Date: 09-2022
DOI: 10.1371/JOURNAL.PGEN.1010294
Abstract: For Alzheimer’s disease–a leading cause of dementia and global morbidity–improved identification of presymptomatic high-risk in iduals and identification of new circulating biomarkers are key public health needs. Here, we tested the hypothesis that a polygenic predictor of risk for Alzheimer’s disease would identify a subset of the population with increased risk of clinically diagnosed dementia, subclinical neurocognitive dysfunction, and a differing circulating proteomic profile. Using summary association statistics from a recent genome-wide association study, we first developed a polygenic predictor of Alzheimer’s disease comprised of 7.1 million common DNA variants. We noted a 7.3-fold (95% CI 4.8 to 11.0 p 0.001) gradient in risk across deciles of the score among 288,289 middle-aged participants of the UK Biobank study. In cross-sectional analyses stratified by age, minimal differences in risk of Alzheimer’s disease and performance on a digit recall test were present according to polygenic score decile at age 50 years, but significant gradients emerged by age 65. Similarly, among 30,541 participants of the Mass General Brigham Biobank, we again noted no significant differences in Alzheimer’s disease diagnosis at younger ages across deciles of the score, but for those over 65 years we noted an odds ratio of 2.0 (95% CI 1.3 to 3.2 p = 0.002) in the top versus bottom decile of the polygenic score. To understand the proteomic signature of inherited risk, we performed aptamer-based profiling in 636 blood donors (mean age 43 years) with very high or low polygenic scores. In addition to the well-known apolipoprotein E biomarker, this analysis identified 27 additional proteins, several of which have known roles related to disease pathogenesis. Differences in protein concentrations were consistent even among the youngest subset of blood donors (mean age 33 years). Of these 28 proteins, 7 of the 8 proteins with concentrations available were similarly associated with the polygenic score in participants of the Multi-Ethnic Study of Atherosclerosis. These data highlight the potential for a DNA-based score to identify high-risk in iduals during the prolonged presymptomatic phase of Alzheimer’s disease and to enable biomarker discovery based on profiling of young in iduals in the extremes of the score distribution.
Publisher: Cold Spring Harbor Laboratory
Date: 11-03-2016
DOI: 10.1101/043042
Abstract: Next generation DNA sequencing methods have created an unprecedented leap in sequence data generation, thus novel computational tools and statistical models are required to optimize and assess the resulting data. In this report, we explore underlying causes of error for the Illumina Genome Analyzer (IGA) sequencing technology and attempt to quantify their effects using a human bacterial artificial chromosome sequenced to 60,000 fold coverage. Seven potential error predictors are considered: Phred score, read entropy, tile coordinates, local tile density, base position within read, nucleotide call, and lane. With these parameters, logistic regression and log-linear models are constructed and used to show that each of the potential predictors contributes to error (P x10-4). With this additional information, we apply the logistic model and achieve a 3% improvement in both the sensitivity and specificity to detect IGA errors. Further, we demonstrate that these modeling approaches can be used as a feedback loop to inform laboratory methods and identify specific machine or run bias.
Publisher: Elsevier BV
Date: 07-2017
DOI: 10.1016/J.JACI.2017.05.015
Abstract: Advances in metagenomics, proteomics, metabolomics, and systems biology are providing a new emphasis in research interdisciplinary work suggests that personalized medicine is on the horizon. These advances are illuminating sophisticated interactions between human-associated microbes and the immune system. The result is a transformed view of future prevention and treatment of chronic noncommunicable diseases, including allergy. Paradigm-shifting gains in scientific knowledge are occurring at a time of rapid global environmental change, urbanization, and bio ersity losses. Multifactorial and multigenerational implications of total environmental exposures, the exposome, require coordinated interdisciplinary efforts. It is clear that the genome alone cannot provide answers to urgent questions. Here we review the historical origins of exposome research and define a new concept, the metaexposome, which considers the bidirectional effect of the environment on human subjects and the human influence on all living systems and their genomes. The latter is essential for human health. We place the metaexposome in the context of early-life immune functioning and describe how various aspects of a changing environment, especially through microbiota exposures, can influence health and disease over the life course.
Publisher: Cold Spring Harbor Laboratory
Date: 18-02-2020
DOI: 10.1101/2020.02.17.952788
Abstract: Polygenic scores (PGSs) for blood cell traits can be constructed using summary statistics from genome-wide association studies. As the selection of variants and the modelling of their interactions in PGSs may be limited by univariate analysis, therefore, such a conventional method may yield sub-optional performance. This study evaluated the relative effectiveness of four machine learning and deep learning methods, as well as a univariate method, in the construction of PGSs for 26 blood cell traits, using data from UK Biobank (n=~400,000) and INTERVAL (n=~40,000). Our results showed that learning methods can improve PGSs construction for nearly every blood cell trait considered, with this superiority explained by the ability of machine learning methods to capture interactions among variants. This study also demonstrated that populations can be well stratified by the PGSs of these blood cell traits, even for traits that exhibit large differences between ages and sexes, suggesting potential for disease prevention. As our study found genetic correlations between the PGSs for blood cell traits and PGSs for several common human diseases (recapitulating well-known associations between the blood cell traits themselves and certain diseases), it suggests that blood cell traits may be indicators or/and mediators for a variety of common disorders via shared genetic variants and functional pathways.
Publisher: Springer Science and Business Media LLC
Date: 28-07-2020
DOI: 10.1038/S41467-020-17477-X
Abstract: Chronic immune-mediated diseases of adulthood often originate in early childhood. To investigate genetic associations between neonatal immunity and disease, we map expression quantitative trait loci (eQTLs) in resting myeloid cells and CD4 + T cells from cord blood s les, as well as in response to lipopolysaccharide (LPS) or phytohemagglutinin (PHA) stimulation, respectively. Cis -eQTLs are largely specific to cell type or stimulation, and 31% and 52% of genes with cis -eQTLs have response eQTLs (reQTLs) in myeloid cells and T cells, respectively. We identified cis regulatory factors acting as mediators of trans effects. There is extensive colocalisation between condition-specific neonatal cis -eQTLs and variants associated with immune-mediated diseases, in particular CTSH had widespread colocalisation across diseases. Mendelian randomisation shows causal neonatal gene expression effects on disease risk for BTN3A2 , HLA-C and others. Our study elucidates the genetics of gene expression in neonatal immune cells, and aetiological origins of autoimmune and allergic diseases.
Publisher: Cold Spring Harbor Laboratory
Date: 23-03-2022
DOI: 10.1101/2022.03.22.22272736
Abstract: The gut-lung axis is generally recognized, but there are few large studies of the gut microbiome and incident respiratory disease in adults. To investigate the associations between gut microbiome and respiratory disease and to construct predictive models from baseline gut microbiome profiles for incident asthma or chronic obstructive pulmonary disease (COPD). Shallow metagenomic sequencing was performed for stool s les from a prospective, population-based cohort (FINRISK02 N=7,115 adults) with linked national administrative health register derived classifications for incident asthma and COPD up to 15 years after baseline. Generalised linear models and Cox regressions were utilised to assess associations of microbial taxa and ersity with disease occurrence. Predictive models were constructed using machine learning with extreme gradient boosting. Models considered taxa abundances in idually and in combination with other risk factors, including sex, age, body mass index and smoking status. A total of 695 and 392 significant microbial associations at different taxonomic levels were found with incident asthma and COPD, respectively. Gradient boosting decision trees of baseline gut microbiome predicted incident asthma and COPD with mean area under the curves of 0.608 and 0.780, respectively. For both incident asthma and COPD, the baseline gut microbiome had C-indices of 0.623 for asthma and 0.817 for COPD, which were more predictive than other conventional risk factors. The integration of gut microbiome and conventional risk factors further improved prediction capacities. Subgroup analyses indicated gut microbiome was significantly associated with incident COPD in both current smokers and non-smokers, as well as in in iduals who reported never smoking. The gut microbiome is a significant risk factor for incident asthma and incident COPD and is largely independent of conventional risk factors.
Publisher: Oxford University Press (OUP)
Date: 27-07-2017
Abstract: The intestinal microbiota is a key antigenic driver in Crohn's disease [CD]. We aimed to identify changes in the gut microbiome associated with, and predictive of, disease recurrence and remission. A total of 141 mucosal biopsy s les from 34 CD patients were obtained at surgical resection and at colonoscopy 6 and/or 18 months postoperatively 28 control s les were obtained: 12 from healthy patients [healthy controls] and 16 from hemicolectomy patients [surgical controls]. Bacterial 16S ribosomal profiling was performed using the Illumina MiSeq platform. CD was associated with reduced alpha ersity when compared with healthy controls but not surgical controls [p < 0.001 and p = 0.666, respectively]. Beta ersity [composition] differed significantly between CD and both healthy [p < 0.001] and surgical [p = 0.022] controls, but did not differ significantly between those with and without endoscopic recurrence. There were significant taxonomic differences between recurrence and remission. Patients experiencing recurrence demonstrated elevated Proteus genera [p = 0.008] and reduced Faecalibacterium [p< 0.001]. Active smoking was associated with elevated levels of Proteus [p = 0.013] postoperatively. Low abundance of Faecalibacterium [< 0.1%] and detectable Proteus in the postoperative ileal mucosa was associated with a higher risk of recurrence (odds ratio [OR] 14 [1.7-110], p = 0.013 and 13 [1.1-150], p = 0.039, respectively) when corrected for smoking. A model of recurrence comprising the presence of Proteus, abundance of Faecalibacterium, and smoking status showed moderate accuracy (area under the curve [AUC] 0.740, 95% confidence interval [CI] [0.69-0.79]). CD is associated with a microbial signature distinct from health. Microbial factors and smoking independently influence postoperative CD recurrence. The genus Proteus may play a role in the development of CD.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 11-2010
DOI: 10.1161/ATVBAHA.109.201020
Abstract: Objective— Genetic studies might provide new insights into the biological mechanisms underlying lipid metabolism and risk of CAD. We therefore conducted a genome-wide association study to identify novel genetic determinants of low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides. Methods and Results— We combined genome-wide association data from 8 studies, comprising up to 17 723 participants with information on circulating lipid concentrations. We did independent replication studies in up to 37 774 participants from 8 populations and also in a population of Indian Asian descent. We also assessed the association between single-nucleotide polymorphisms (SNPs) at lipid loci and risk of CAD in up to 9 633 cases and 38 684 controls. We identified 4 novel genetic loci that showed reproducible associations with lipids (probability values, 1.6×10 −8 to 3.1×10 −10 ). These include a potentially functional SNP in the SLC39A8 gene for HDL-C, an SNP near the MYLIP/GMPR and PPP1R3B genes for LDL-C, and at the AFF1 gene for triglycerides. SNPs showing strong statistical association with 1 or more lipid traits at the CELSR2 , APOB , APOE-C1-C4-C2 cluster, LPL , ZNF259-APOA5-A4-C3-A1 cluster and TRIB1 loci were also associated with CAD risk (probability values, 1.1×10 −3 to 1.2×10 −9 ). Conclusion— We have identified 4 novel loci associated with circulating lipids. We also show that in addition to those that are largely associated with LDL-C, genetic loci mainly associated with circulating triglycerides and HDL-C are also associated with risk of CAD. These findings potentially provide new insights into the biological mechanisms underlying lipid metabolism and CAD risk.
Publisher: Cold Spring Harbor Laboratory
Date: 23-05-2020
DOI: 10.1101/2020.05.20.20108217
Abstract: Polygenic [risk] scores (PGS) can enhance prediction and understanding of common diseases and traits. However, the reproducibility of PGS and their subsequent applications in biological and clinical research have been hindered by several factors, including: inadequate and incomplete reporting of PGS development, heterogeneity in evaluation techniques, and inconsistent access to, and distribution of, the information necessary to calculate the scores themselves. To address this we present the PGS Catalog (www.PGSCatalog.org), an open resource for polygenic scores. The PGS Catalog currently contains 192 published PGS from 78 publications for 86 erse traits, including diabetes, cardiovascular diseases, neurological disorders, cancers, as well as traits like BMI and blood lipids. Each PGS is annotated with metadata required for reproducibility as well as accurate application in independent studies. Using the PGS Catalog, we demonstrate that multiple PGS can be systematically evaluated to generate comparable performance metrics. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with an open platform for polygenic score research and translation.
Publisher: Springer Science and Business Media LLC
Date: 20-11-2014
Publisher: Springer Science and Business Media LLC
Date: 09-08-2022
DOI: 10.1038/S41467-022-32095-5
Abstract: In iduals with South Asian ancestry have a higher risk of heart disease than other groups but have been largely excluded from genetic research. Using data from 22,000 British Pakistani and Bangladeshi in iduals with linked electronic health records from the Genes & Health cohort, we conducted genome-wide association studies of coronary artery disease and its key risk factors. Using power-adjusted transferability ratios, we found evidence for transferability for the majority of cardiometabolic loci powered to replicate. The performance of polygenic scores was high for lipids and blood pressure, but lower for BMI and coronary artery disease. Adding a polygenic score for coronary artery disease to clinical risk factors showed significant improvement in reclassification. In Mendelian randomisation using transferable loci as instruments, our findings were consistent with results in European-ancestry in iduals. Taken together, trait-specific transferability of trait loci between populations is an important consideration with implications for risk prediction and causal inference.
Publisher: Cold Spring Harbor Laboratory
Date: 09-03-2021
DOI: 10.1101/2021.03.08.434372
Abstract: Bioinformatic research relies on large-scale computational infrastructures which have a non-zero carbon footprint. So far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this study, we estimate the bioinformatic carbon footprint (in kilograms of CO 2 equivalent units, kgCO 2 e) using the freely available Green Algorithms calculator ( www.green-algorithms.org ). We assess (i) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics and molecular simulations, as well as (ii) computation strategies, such as parallelisation, CPU (central processing unit) vs GPU (graphics processing unit), cloud vs. local computing infrastructure and geography. In particular, for GWAS, we found that biobank-scale analyses emitted substantial kgCO 2 e and simple software upgrades could make GWAS greener, e.g. upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Switching from the average data centre to a more efficient data centres can reduce carbon footprint by ~34%. Memory over-allocation can be a substantial contributor to an algorithm’s carbon footprint. The use of faster processors or greater parallelisation reduces run time but can lead to, sometimes substantially, greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimise kgCO 2 e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
Publisher: Cold Spring Harbor Laboratory
Date: 28-06-2021
DOI: 10.1101/2021.06.23.21259247
Abstract: The 10-year Atherosclerotic Cardiovascular Disease (ASCVD) risk score is the standard approach to predict risk of incident cardiovascular events and recently, addition of CAD polygenic scores (PGS CAD ) have been evaluated. Although age and sex strongly predict the risk of CAD, their interaction with genetic risk prediction has not been systematically examined. This study performed an in-depth evaluation of age and sex effects in genetic CAD risk prediction. The population-based Norwegian HUNT2 cohort of 51,036 in iduals was used as the primary dataset. Findings were replicated in the UK Biobank (372,410 in iduals). Models for 10-year CAD risk were fitted using Cox proportional hazards and Harrell’s concordance index, sensitivity, and specificity were compared. Inclusion of age and sex interactions of PGS CAD to the prediction models increased C-index and sensitivity likely countering the observed survival bias in the baseline. The sensitivity for females was lower than males in all models including genetic information. The two-step approach identified a total of 82.6% of incident CAD cases (74.1% by ASCVD risk score and an additional 8.5% by the PGS CAD interaction model). These findings highlight the importance and complexity of genetic risk in predicting CAD. There is a need for modeling age and sex-interactions terms with polygenic scores to optimize detection of in iduals at high-risk, those who warrant preventive interventions. Sex-specific studies are needed to understand and estimate CAD risk with genetic information. This study used two large population-based longitudinal datasets to evaluate genetic prediction of CAD including age and sex interactions. The model fit and sensitivity of the prediction models increased when including age and sex interaction of PGS CAD to the prediction models likely countering the observed survival bias in the baseline. The sensitivity for females was lower than for males in all models including genetic information. Our results highlight the importance and complexity of genetic risk and suggest including age and sex interactions with polygenic scores to identify more high-risk in iduals for preventive interventions.
Publisher: Elsevier BV
Date: 10-2021
Publisher: Springer Science and Business Media LLC
Date: 11-2021
DOI: 10.1038/S41591-021-01549-6
Abstract: Polygenic risk scores (PRSs) aggregate the many small effects of alleles across the human genome to estimate the risk of a disease or disease-related trait for an in idual. The potential benefits of PRSs include cost-effective enhancement of primary disease prevention, more refined diagnoses and improved precision when prescribing medicines. However, these must be weighed against the potential risks, such as uncertainties and biases in PRS performance, as well as potential misunderstanding and misuse of these within medical practice and in wider society. By addressing key issues including gaps in best practices, risk communication and regulatory frameworks, PRSs can be used responsibly to improve human health. Here, the International Common Disease Alliance's PRS Task Force, a multidisciplinary group comprising expertise in genetics, law, ethics, behavioral science and more, highlights recent research to provide a comprehensive summary of the state of polygenic score research, as well as the needs and challenges as PRSs move closer to widespread use in the clinic.
Publisher: Springer Science and Business Media LLC
Date: 30-09-2022
DOI: 10.1038/S41586-022-05165-3
Abstract: Previous genome-wide association studies (GWASs) of stroke — the second leading cause of death worldwide — were conducted predominantly in populations of European ancestry 1,2 . Here, in cross-ancestry GWAS meta-analyses of 110,182 patients who have had a stroke (five ancestries, 33% non-European) and 1,503,898 control in iduals, we identify association signals for stroke and its subtypes at 89 (61 new) independent loci: 60 in primary inverse-variance-weighted analyses and 29 in secondary meta-regression and multitrait analyses. On the basis of internal cross-ancestry validation and an independent follow-up in 89,084 additional cases of stroke (30% non-European) and 1,013,843 control in iduals, 87% of the primary stroke risk loci and 60% of the secondary stroke risk loci were replicated ( P 0.05). Effect sizes were highly correlated across ancestries. Cross-ancestry fine-mapping, in silico mutagenesis analysis 3 , and transcriptome-wide and proteome-wide association analyses revealed putative causal genes (such as SH3PXD2A and FURIN ) and variants (such as at GRK5 and NOS3 ). Using a three-pronged approach 4 , we provide genetic evidence for putative drug effects, highlighting F11, KLKB1, PROC, GP1BA, LAMC2 and VCAM1 as possible targets, with drugs already under investigation for stroke for F11 and PROC. A polygenic score integrating cross-ancestry and ancestry-specific stroke GWASs with vascular-risk factor GWASs (integrative polygenic scores) strongly predicted ischaemic stroke in populations of European, East Asian and African ancestry 5 . Stroke genetic risk scores were predictive of ischaemic stroke independent of clinical risk factors in 52,600 clinical-trial participants with cardiometabolic disease. Our results provide insights to inform biology, reveal potential drug targets and derive genetic risk prediction tools across ancestries.
Publisher: Springer Science and Business Media LLC
Date: 14-02-2022
Publisher: Springer Science and Business Media LLC
Date: 29-08-2010
DOI: 10.1038/NG.652
Publisher: Oxford University Press (OUP)
Date: 05-09-2018
DOI: 10.1093/NAR/GKY780
Publisher: Springer Science and Business Media LLC
Date: 08-11-2021
Publisher: Cold Spring Harbor Laboratory
Date: 16-09-2022
DOI: 10.1101/2022.09.16.508259
Abstract: RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood s les from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.
Publisher: Cold Spring Harbor Laboratory
Date: 03-03-2020
DOI: 10.1101/2020.02.29.970970
Abstract: Antimicrobial resistance (AMR) in bacteria has been a global threat to public health for decades. A well-known driving force for the emergence, evolution and dissemination of genetic AMR determinants in bacterial populations is horizontal gene transfer, which is frequently mediated by mobile genetic elements (MGEs). Some MGEs can capture, maintain, and rearrange multiple AMR genes in a donor bacterium before moving them into recipients, giving rise to a phenomenon called horizontal gene co-transfer (HGcoT). This physical linkage or co-localisation between mobile AMR genes is of particular concern because it facilitates rapid dissemination of multidrug resistance within and across bacterial populations, providing opportunities for co-selection of AMR genes and limiting our therapeutic options. The study of HGcoT can be benefited from large-scale whole-genome sequencing (WGS) data, however, by far most published studies of HGcoT only consider simple co-occurrence measures, which can be confounded by strong bacterial population structure due to clonal reproduction, leading to spurious associations. To address this issue, we present GeneMates, an R package implementing a network approach to identification of HGcoT using WGS data. The package enables users to test for associations between presence-absence of bacterial genes using univariate linear mixed models controlling for population structure based on core-genome variation. Furthermore, when physical distances between genes of interest are measurable in bacterial genomes, users can evaluate distance consistency to further support their inference of putative horizontally co-transferred genes, whose co-occurrence cannot be completely explained by the population structure. We demonstrate how this package can be used to identify co-transferred AMR genes and recover known MGEs from Escherichia coli and Salmonella Typhimurium WGS data. GeneMates is accessible at anyuac/GeneMates .
Publisher: Cold Spring Harbor Laboratory
Date: 05-11-2022
DOI: 10.1101/2022.11.03.22281872
Abstract: Whole genome sequencing (WGS) and phenotypic drug susceptibility testing was performed on a collection of 2,542 Mycobacterium tuberculosis (Mtb) isolates from tuberculosis (TB) patients recruited in Ho Chi Minh City (HCMC), Vietnam, to investigate Mtb ersity, the prevalence and phylodynamics of drug resistance, and in silico resistance prediction with sequencing data. Amongst isolates tested phenotypically against first-line drugs, we observed high rates of streptomycin [STR, 37.7% (N=573/1,520)] and isoniazid resistance [INH, 25.7% (N=459/1,786)], and lower rates of resistance to rif icin [RIF, 4.9% (N=87/1,786)] and ethambutol [EMB, 4.2% (N=75/1,785)]. Resistance to STR and INH was predicted moderately well when applying the TB-Profiler algorithm to WGS data (sensitivities of 0.81 and 0.87 respectively), while resistance to RIF and EMB was predicted relatively poorly (sensitivities of 0.70 and 0.44 respectively). Rates of multidrug-resistance [(MDR, 3.9% (N=69/1,786)], and resistance to a number of second-line drugs [Para-aminosalicylic acid (29.6% N=79/267), Amikacin (15.4% N=41/267) and Moxifloxacin (21.3%), N=57/267], were found to be high within a global context. Comparing rates of drug resistance among lineages, and exploring the dynamics of resistance acquisition through time, suggest the Beijing lineage (lineage 2.2) acquires de novo resistance mutations at higher rates and suffers no apparent fitness cost acting to impede the transmission of resistance. We infer resistance to INH and STR to have arisen earlier, on average, than resistance to RIF, and to be more widespread across the phylogeny. The high prevalence of ‘background’ INH resistance, combined with high rates of RIF mono-resistance (20.7%, N=18/87) suggests that rapid assays for INH resistance will be valuable in this setting. These tests will allow the detection of INH mono-resistance, and will allow MDR isolates to be distinguished from isolates with RIF mono-resistance.
Publisher: Cold Spring Harbor Laboratory
Date: 08-02-2022
DOI: 10.1101/2022.02.07.479382
Abstract: Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico , but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specialises in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison and application of PPI networks.
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Springer Science and Business Media LLC
Date: 20-02-2020
DOI: 10.1038/S41467-020-14717-Y
Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Publisher: Springer Science and Business Media LLC
Date: 18-04-2014
Publisher: Cold Spring Harbor Laboratory
Date: 11-02-2020
DOI: 10.1101/2020.02.08.20021022
Abstract: Epidemiological and animal studies have associated systemic inflammation with blood pressure (BP). However, the mechanistic factors linking inflammation and BP remain unknown. Fatty acid derived eicosanoids serve as mediators of inflammation and have been suggested to also regulate renal vascular tone, peripheral resistance, renin-angiotensin system, and endothelial function. We therefore hypothesize that specific pro- and anti-inflammatory eicosanoids are linked with BP. We studied a population s le of 8099 FINRISK 2002 participants randomly drawn from the Finnish population register (53% women, mean age 48±13 years) and, for external validation, a s le of 2859 Framingham Heart Study (FHS) Offspring study participants (55% women, mean age 66±9 years). Using non-targeted liquid chromatography-mass spectrometry, we profiled 545 distinct high-quality eicosanoids and related oxylipin mediators in plasma. Adjusting for conventional hypertension risk factors, we observed 187 (34%) metabolites that were significantly associated with systolic BP ( P Bonferroni-corrected threshold of 0.05/545). We used forward selection linear regression modeling in FINRISK to define a general formula for in idual eicosanoid risk score. In iduals of the top risk score quartile in FINRISK had a 9.0 mmHg (95% CI 8.0-10.1) higher systolic BP compared with in iduals in the lowest quartile in fully adjusted models. Observed metabolite associations were consistent across FINRISK and FHS. In conclusion, plasma eicosanoids demonstrate strong associations with BP in the general population. As eicosanoid compounds affect numerous physiological processes that are central to BP regulation, they may offer new insights regarding pathogenesis of hypertension, as well as serve as potential targets for therapeutic intervention.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 09-2021
DOI: 10.1161/STROKEAHA.120.032619
Abstract: Early prediction of risk of cardiovascular disease (CVD), including stroke, is a cornerstone of disease prevention. Clinical risk scores have been widely used for predicting CVD risk from known risk factors. Most CVDs have a substantial genetic component, which also has been confirmed for stroke in recent gene discovery efforts. However, the role of genetics in prediction of risk of CVD, including stroke, has been limited to testing for highly penetrant monogenic disorders. In contrast, the importance of polygenic variation, the aggregated effect of many common genetic variants across the genome with in idually small effects, has become more apparent in the last 5 to 10 years, and powerful polygenic risk scores for CVD have been developed. Here we review the current state of the field of polygenic risk scores for CVD including stroke, and their potential to improve CVD risk prediction. We present findings and lessons from diseases such as coronary artery disease as these will likely be useful to inform future research in stroke polygenic risk prediction.
Publisher: Oxford University Press (OUP)
Date: 29-08-2019
DOI: 10.1093/BIOINFORMATICS/BTY734
Abstract: A common goal of microbiome studies is the elucidation of community composition and member interactions using counts of taxonomic units extracted from sequence data. Inference of interaction networks from sparse and compositional data requires specialized statistical approaches. A popular solution is SparCC, however its performance limits the calculation of interaction networks for very high-dimensional datasets. Here we introduce FastSpar, an efficient and parallelizable implementation of the SparCC algorithm which rapidly infers correlation networks and calculates P-values using an unbiased estimator. We further demonstrate that FastSpar reduces network inference wall time by 2–3 orders of magnitude compared to SparCC. FastSpar source code, precompiled binaries and platform packages are freely available on GitHub: cwatts/FastSpar Supplementary data are available at Bioinformatics online.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 15-11-2022
DOI: 10.1161/CIRCULATIONAHA.122.060700
Abstract: End-stage renal disease is associated with a high risk of cardiovascular events. It is unknown, however, whether mild-to-moderate kidney dysfunction is causally related to coronary heart disease (CHD) and stroke. Observational analyses were conducted using in idual-level data from 4 population data sources (Emerging Risk Factors Collaboration, EPIC-CVD [European Prospective Investigation into Cancer and Nutrition–Cardiovascular Disease Study], Million Veteran Program, and UK Biobank), comprising 648 135 participants with no history of cardiovascular disease or diabetes at baseline, yielding 42 858 and 15 693 incident CHD and stroke events, respectively, during 6.8 million person-years of follow-up. Using a genetic risk score of 218 variants for estimated glomerular filtration rate (eGFR), we conducted Mendelian randomization analyses involving 413 718 participants (25 917 CHD and 8622 strokes) in EPIC-CVD, Million Veteran Program, and UK Biobank. There were U-shaped observational associations of creatinine-based eGFR with CHD and stroke, with higher risk in participants with eGFR values or mL·min –1 ·1.73 m –2 , compared with those with eGFR between 60 and 105 mL·min –1 ·1.73 m –2 . Mendelian randomization analyses for CHD showed an association among participants with eGFR mL·min –1 ·1.73 m –2 , with a 14% (95% CI, 3%–27%) higher CHD risk per 5 mL·min –1 ·1.73 m –2 lower genetically predicted eGFR, but not for those with eGFR mL·min –1 ·1.73 m –2 . Results were not materially different after adjustment for factors associated with the eGFR genetic risk score, such as lipoprotein(a), triglycerides, hemoglobin A1c, and blood pressure. Mendelian randomization results for stroke were nonsignificant but broadly similar to those for CHD. In people without manifest cardiovascular disease or diabetes, mild-to-moderate kidney dysfunction is causally related to risk of CHD, highlighting the potential value of preventive approaches that preserve and modulate kidney function.
Publisher: Springer Science and Business Media LLC
Date: 20-12-2019
DOI: 10.1038/S41467-019-13848-1
Abstract: Recent genome-wide association studies in stroke have enabled the generation of genomic risk scores (GRS) but their predictive power has been modest compared to established stroke risk factors. Here, using a meta-scoring approach, we develop a metaGRS for ischaemic stroke (IS) and analyse this score in the UK Biobank ( n = 395,393 3075 IS events by age 75). The metaGRS hazard ratio for IS (1.26, 95% CI 1.22–1.31 per metaGRS standard deviation) doubles that of a previous GRS, identifying a subset of in iduals at monogenic levels of risk: the top 0.25% of metaGRS have three-fold risk of IS. The metaGRS is similarly or more predictive compared to several risk factors, such as family history, blood pressure, body mass index, and smoking. We estimate the reductions needed in modifiable risk factors for in iduals with different levels of genomic risk and suggest that, for in iduals with high metaGRS, achieving risk factor levels recommended by current guidelines may be insufficient to mitigate risk.
Publisher: Elsevier BV
Date: 07-2016
DOI: 10.1016/J.CELS.2016.06.012
Abstract: Network modules-topologically distinct groups of edges and nodes-that are preserved across datasets can reveal common features of organisms, tissues, cell types, and molecules. Many statistics to identify such modules have been developed, but testing their significance requires heuristics. Here, we demonstrate that current methods for assessing module preservation are systematically biased and produce skewed p values. We introduce NetRep, a rapid and computationally efficient method that uses a permutation approach to score module preservation without assuming data are normally distributed. NetRep produces unbiased p values and can distinguish between true and false positives during multiple hypothesis testing. We use NetRep to quantify preservation of gene coexpression modules across murine brain, liver, adipose, and muscle tissues. Complex patterns of multi-tissue preservation were revealed, including a liver-derived housekeeping module that displayed adipose- and muscle-specific association with body weight. Finally, we demonstrate the broader applicability of NetRep by quantifying preservation of bacterial networks in gut microbiota between men and women.
Publisher: Cold Spring Harbor Laboratory
Date: 09-2010
Abstract: The combining of genome-wide association (GWA) data across populations represents a major challenge for massive global meta-analyses. Genotype imputation using densely genotyped reference s les facilitates the combination of data across different genotyping platforms. HapMap data is typically used as a reference for single nucleotide polymorphism (SNP) imputation and tagging copy number polymorphisms (CNPs). However, the advantage of having population-specific reference panels for founder populations has not been evaluated. We looked at the properties and impact of adding 81 in iduals from a founder population to HapMap3 reference data on imputation quality, CNP tagging, and power to detect association in simulations and in an independent cohort of 2138 in iduals. The gain in SNP imputation accuracy was highest among low-frequency markers (minor allele frequency [MAF] 5%), for which adding the population-specific s les to the reference set increased the median R 2 between imputed and genotyped SNPs from 0.90 to 0.94. Accuracy also increased in regions with high recombination rates. Similarly, a reference set with population-specific extension facilitated the identification of better tag-SNPs for a subset of CNPs for 4% of CNPs the R 2 between SNP genotypes and CNP intensity in the independent population cohort was at least twice as high as without the extension. We conclude that even a relatively small population-specific reference set yields considerable benefits in SNP imputation, CNP tagging accuracy, and the power to detect associations in founder populations and population isolates in particular.
Publisher: Cold Spring Harbor Laboratory
Date: 26-07-2021
DOI: 10.1101/2021.07.23.21260986
Abstract: Peptide markers of inflammation have been associated with the development of type 2 diabetes. The role of upstream, lipid-derived mediators of inflammation such as eicosanoids, remains less clear. The aim was to examine whether eicosanoids are associated with incident type 2 diabetes. In the FINRISK 2002 study, a population-based s le of Finnish men and women aged 25-74 years, we used directed, non-targeted liquid chromatography – mass spectrometry to identify 545 eicosanoids and related oxylipins in the participants’ plasma s les (n=8,292). We used multivariable-adjusted Cox regression to examine associations between eicosanoids and incident type 2 diabetes. The findings were replicated in the Framingham Heart Study (FHS, n=2,886) and DILGOM 2007 (n=3,905). Together, these three cohorts had 1070 cases of incident type 2 diabetes. 76 eicosanoids were associated in idually with incident type 2 diabetes. We identified three eicosanoids independently associated with incident type 2 diabetes using stepwise Cox regression with forward selection and a Bonferroni-corrected inclusion threshold. A three-eicosanoid risk score produced a hazard ratio (HR) of 1.56 (95% confidence interval 1.41-1.72) per one standard deviation (SD) increment for risk of incident diabetes. The HR for comparing the top quartile to the lowest was 2.80 (2.53-3.07). Meta-analysis of the three cohorts yielded a pooled HR per SD of 1.31 (1.05-1.56). Plasma eicosanoid profiles predict incident type 2 diabetes and the clearest signals replicate in three independent cohorts. Our findings give new information on the biology underlying type 2 diabetes and suggest opportunities for early identification of people at risk.
Publisher: Wiley
Date: 16-08-2011
Abstract: Ground-state disulfide dissociation is a target of prime importance in structural biochemistry. A main difficulty consists in avoiding competition with carbon–sulfur and backbone scission pathways. In tandem mass spectrometry, such selectivity is afforded using transition elements or coinage-metal ions as catalyst. Yet, the underlying gas-phase mechanistic details remain poorly understood. Gold(I)-assisted disulfide cleavage is investigated by means of DFT calculations, to elucidate the highly selective and specific catalytic action of this transition-metal cation, a most promising one in tandem mass spectrometry. The preferential cleavage of sulfur–sulfur versus carbon–sulfur linkages on dimethyldisulfide, taken as a prototypical aliphatic compound, is rationalized on the basis of molecular orbital arguments. Secondly, it is revealed that the disulfide dissociation profile is dramatically impacted by a peptidic environment. Calculations on L,L-cystine derivatives show two main factors: the topological frustration for an embedded -CH(2)-S-S-CH(2)- motif induces a 5 kcal mol(-1) penalty, whereas electrophilic assistance via complexation of nitrogen and oxygen atoms lowers activation barriers by a factor of 3. S-S weakening is both thermodynamically and kinetically driven by the versatile coordination mode of gold(I). The influence of amine-terminus group protonation is finally sketched: it gives rise to an intermediate reactivity. This study sheds lights on the key action of the peptidic environment in tuning the dissociation profile in the presence of this transition-metal monocation.
Publisher: Public Library of Science (PLoS)
Date: 14-01-2021
DOI: 10.1371/JOURNAL.PMED.1003498
Abstract: Polygenic risk scores (PRSs) can stratify populations into cardiovascular disease (CVD) risk groups. We aimed to quantify the potential advantage of adding information on PRSs to conventional risk factors in the primary prevention of CVD. Using data from UK Biobank on 306,654 in iduals without a history of CVD and not on lipid-lowering treatments (mean age [SD]: 56.0 [8.0] years females: 57% median follow-up: 8.1 years), we calculated measures of risk discrimination and reclassification upon addition of PRSs to risk factors in a conventional risk prediction model (i.e., age, sex, systolic blood pressure, smoking status, history of diabetes, and total and high-density lipoprotein cholesterol). We then modelled the implications of initiating guideline-recommended statin therapy in a primary care setting using incidence rates from 2.1 million in iduals from the Clinical Practice Research Datalink. The C-index, a measure of risk discrimination, was 0.710 (95% CI 0.703–0.717) for a CVD prediction model containing conventional risk predictors alone. Addition of information on PRSs increased the C-index by 0.012 (95% CI 0.009–0.015), and resulted in continuous net reclassification improvements of about 10% and 12% in cases and non-cases, respectively. If a PRS were assessed in the entire UK primary care population aged 40–75 years, assuming that statin therapy would be initiated in accordance with the UK National Institute for Health and Care Excellence guidelines (i.e., for persons with a predicted risk of ≥10% and for those with certain other risk factors, such as diabetes, irrespective of their 10-year predicted risk), then it could help prevent 1 additional CVD event for approximately every 5,750 in iduals screened. By contrast, targeted assessment only among people at intermediate (i.e., 5% to %) 10-year CVD risk could help prevent 1 additional CVD event for approximately every 340 in iduals screened. Such a targeted strategy could help prevent 7% more CVD events than conventional risk prediction alone. Potential gains afforded by assessment of PRSs on top of conventional risk factors would be about 1.5-fold greater than those provided by assessment of C-reactive protein, a plasma biomarker included in some risk prediction guidelines. Potential limitations of this study include its restriction to European ancestry participants and a lack of health economic evaluation. Our results suggest that addition of PRSs to conventional risk factors can modestly enhance prediction of first-onset CVD and could translate into population health benefits if used at scale.
Publisher: Elsevier BV
Date: 04-2022
Publisher: Cold Spring Harbor Laboratory
Date: 13-09-2017
DOI: 10.1101/188334
Abstract: Rheumatic heart disease (RHD) following Group A Streptococcus (GAS) infections is heritable and prevalent in Indigenous populations. Molecular mimicry between human and GAS proteins triggers pro-inflammatory cardiac valve-reactive T-cells. Genome-wide genetic analysis was undertaken in 1263 Aboriginal Australians (398 RHD cases 865 controls). Single nucleotide polymorphisms (SNPs) were genotyped using Illumina HumanCoreExome BeadChips. Direct typing and imputation was used to fine-map the human leukocyte antigen (HLA) region. Epitope binding affinities were mapped for human cross-reactive GAS proteins, including M5 and M6. The strongest genetic association was intronic to HLA-DQA1 (rs9272622 P=1.86x10 −7 ). Conditional analyses showed rs9272622 and/or DQA1*AA16 account for the HLA signal. HLA-DQA1*0101_DQB1*0503 (OR 1.44, 95%CI 1.09-1.90, P=9.56x10 −3 ) and HLA-DQA1*0103_DQB1*0601 (OR 1.27, 95%CI 1.07-1.52, P=7.15x10 −3 ) were risk haplotypes HLA_DQA1*0301-DQB1*0402 (OR 0.30, 95%CI 0.14-0.65, P=2.36x10 −3 ) was protective. Human myosin cross-reactive N-terminal and B repeat epitopes of GAS M5/M6 bind with higher affinity to DQA1/DQB1 alpha/beta dimers for the two risk haplotypes than the protective haplotype. Variation at HLA_DQA1-DQB1 is the major genetic risk factor for RHD in Aboriginal Australians studied here. Cross-reactive epitopes bind with higher affinity to alpha/beta dimers formed by risk haplotypes, supporting molecular mimicry as the key mechanism of RHD pathogenesis.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 09-2021
DOI: 10.1161/STROKEAHA.120.033670
Abstract: Polygenic risk scores (PRSs) can be used to predict ischemic stroke (IS). However, further validation of PRS performance is required in independent populations, particularly older adults in whom the majority of strokes occur. We predicted risk of incident IS events in a population of 12 792 healthy older in iduals enrolled in the ASPREE trial (Aspirin in Reducing Events in the Elderly). The PRS was calculated using 3.6 million genetic variants. Participants had no previous history of cardiovascular events, dementia, or persistent physical disability at enrollment. The primary outcome was IS over 5 years, with stroke subtypes as secondary outcomes. A multivariable model including conventional risk factors was applied and reevaluated after adding PRS. Area under the curve and net reclassification were evaluated. At baseline, mean population age was 75 years. In total, 173 incident IS events occurred over a median follow-up of 4.7 years. When PRS was added to the multivariable model as a continuous variable, it was independently associated with IS (hazard ratio, 1.41 [95% CI, 1.20–1.65] per SD of the PRS P .001). The PRS alone was a better discriminator for IS events than most conventional risk factors. PRS as a categorical variable was a significant predictor in the highest tertile (hazard ratio, 1.74 P =0.004) compared with the lowest. The area under the curve of the conventional model was 66.6% (95% CI, 62.2–71.1) and after inclusion of the PRS, improved to 68.5 ([95% CI, 64.0–73.0] P =0.095). In subgroup analysis, the continuous PRS remained an independent predictor for large vessel and cardioembolic stroke subtypes but not for small vessel stroke. Reclassification was improved, as the continuous net reclassification index after adding PRS to the conventional model was 0.25 (95% CI, 0.17–0.43). PRS predicts incident IS in a healthy older population but only moderately improves prediction over conventional risk factors. URL: www.clinicaltrials.gov Unique identifier: NCT01038583.
Publisher: Public Library of Science (PLoS)
Date: 09-09-2010
Publisher: Cold Spring Harbor Laboratory
Date: 06-04-2021
DOI: 10.1101/2021.04.04.438427
Abstract: We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to in idual reference genomes as the minimum unit for assessing the ersity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene licon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene licon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene licon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA licon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the in idual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies.
Publisher: Springer Science and Business Media LLC
Date: 10-05-2012
Publisher: Springer Science and Business Media LLC
Date: 06-04-2008
DOI: 10.1038/NG.121
Publisher: Cold Spring Harbor Laboratory
Date: 25-06-2020
DOI: 10.1101/2020.06.24.20138933
Abstract: Gut microbiome sequencing has shown promise as a predictive biomarker for a wide range of diseases, including classification of liver disease and severity grading. However, the potential of gut microbiota for prospective risk prediction of liver disease has not been assessed. Here, we utilise shallow gut metagenomic sequencing data of a large population-based cohort (N= ,115) and ∼15 years of electronic health register follow-up together with machine-learning to investigate the predictive capacity of gut microbial predictors, in idually and in conjunction with conventional risk factors, for incident liver disease and alcoholic liver disease. Separately, conventional and microbiome risk factors showed comparable predictive capacity for incident liver disease. However, microbiome augmentation of conventional risk factor models using gradient boosted classifiers significantly improved performance, with average AUROCs of 0.834 for incident liver disease and 0.956 for alcoholic liver disease (AUPRCs of 0.185 and 0.304, respectively). Disease-free survival analysis showed significantly improved stratification using microbiome-augmented risk models as compared to conventional risk factors alone. Investigation of predictive microbial signatures revealed a wide range of bacterial taxa, including those previously associated with hepatic function and disease. This study supports the potential clinical validity of gut metagenomic sequencing to complement conventional risk factors for risk prediction of liver diseases.
Publisher: Springer Science and Business Media LLC
Date: 29-04-2020
DOI: 10.1038/S41597-020-0463-1
Abstract: Whole exome sequencing (WES) is a popular and successful technology which is widely used in both research and clinical settings. However, there is a paucity of reference data for Aboriginal Australians to underpin the translation of health-based genomic research. Here we provide a catalogue of variants called after sequencing the exomes of 50 Aboriginal in iduals from the Northern Territory (NT) of Australia and compare these to 72 previously published exomes from a Western Australian (WA) population of Martu origin. Sequence data for both NT and WA s les were processed using an ‘intersect-then-combine’ (ITC) approach, using GATK and SAMtools to call variants. A total of 289,829 variants were identified in at least one in idual in the NT cohort and 248,374 variants in at least one in idual in the WA cohort. Of these, 166,719 variants were present in both cohorts, whilst 123,110 variants were private to the NT cohort and 81,655 were private to the WA cohort. Our data set provides a useful reference point for genomic studies on Aboriginal Australians.
Publisher: Cold Spring Harbor Laboratory
Date: 06-04-2016
DOI: 10.1101/047217
Abstract: Sparse canonical correlation analysis (SCCA) is a useful approach for correlating one set of measurements, such as single nucleotide polymorphisms (SNPs), with another set of measurements, such as gene expression levels. We present a fast implementation of SCCA, enabling rapid analysis of hundreds of thousands of SNPs together with thousands of phenotypes. Our approach is implemented both as an R package flashpcaR and within the standalone commandline tool flashpca. abraham/flashpca gad.abraham@unimelb.edu.au
Publisher: Cold Spring Harbor Laboratory
Date: 22-08-2019
DOI: 10.1101/744565
Abstract: There is debate about the value of adding information on genetic and other molecular markers to conventional cardiovascular disease (CVD) risk predictors. Using data on 306,654 in iduals without a history of CVD from UK Biobank, we calculated measures of risk-discrimination and reclassification upon addition of polygenic risk scores (PRS) and a panel of 27 clinical biochemistry markers to a conventional risk prediction model (i.e., including age, sex, systolic blood pressure, smoking status, history of diabetes, total cholesterol and HDL cholesterol). We then modelled implications of initiating guideline-recommended statin therapy after the assessment of molecular markers for a UK primary-care setting. The C-index was 0.710 (95% CI, 0.703-0.717) for a CVD prediction model containing conventional risk predictors alone. The C-index increased by similar amounts when adding information on PRS or biochemistry markers (0.011 and 0.014, respectively P .001), and it increased still further (0.022 P .001) when information on both was combined. Among cases and controls, continuous net reclassification improvements were about 12% and 19%, respectively, when both PRS and biochemistry markers were added. If PRS and biochemistry markers were to be assessed in the entire primary care population aged 40-75, then it could help prevent one additional CVD event for every 893 in iduals screened. By contrast, targeted assessment only among people at intermediate (i.e., 5-10%) 10-year CVD risk could help prevent one additional CVD event for every 233 in iduals screened. This targeted strategy could help reclassify 16% of the intermediate-risk group to the high-risk (i.e., ≥10%) category, preventing 11% more CVD events than conventional risk prediction. Adding information on both PRS and selected biochemistry markers moderately enhanced CVD predictive accuracy and could improve primary prevention of CVD. However, our modelling suggested that targeted assessment of molecular markers among in iduals at intermediate-risk would be more efficient than blanket approaches.
Publisher: Cold Spring Harbor Laboratory
Date: 27-06-2019
DOI: 10.1101/683086
Abstract: Chronic immune-mediated diseases of adulthood often originate in early childhood. To investigate genetic associations between neonatal immunity and disease, we collected cord blood s les from a birth cohort and mapped expression quantitative trait loci (eQTLs) in resting monocytes and CD4 + T cells as well as in response to lipopolysaccharide (LPS) or phytohemagglutinin (PHA) stimulation, respectively. Cis -eQTLs were largely specific to cell type or stimulation, and response eQTLs were identified for 31% of genes with cis -eQTLs (eGenes) in monocytes and 52% of eGenes in CD4 + T cells. We identified trans -eQTLs and mapped cis regulatory factors which act as mediators of trans effects. There was extensive colocalisation of causal variants for cell type- and stimulation-specific neonatal cis -eQTLs and those of autoimmune and allergic diseases, in particular CTSH (Cathepsin H) which showed widespread colocalisation across diseases. Mendelian randomisation showed causal neonatal gene transcription effects on disease risk for BTN3A2 , HLA-C and many other genes. Our study elucidates the genetics of gene expression in neonatal conditions and cell types as well as the aetiological origins of autoimmune and allergic diseases.
Publisher: Springer Science and Business Media LLC
Date: 08-2017
Publisher: Oxford University Press (OUP)
Date: 13-03-2006
Publisher: Cold Spring Harbor Laboratory
Date: 12-03-2018
DOI: 10.1101/280677
Abstract: Integration of systems-level biomolecular information with electronic health records has led to the discovery of robust blood-based biomarkers predictive of future health and disease. Of recent intense interest is the GlycA biomarker, a complex nuclear magnetic resonance (NMR) spectroscopy signal reflective of acute and chronic inflammation, which predicts long term risk of erse outcomes including cardiovascular disease, type 2 diabetes, and all-cause mortality. To systematically explore the specificity of the disease burden indicated by GlycA we analysed the risk for 468 common incident hospitalization and mortality outcomes occurring during an 8-year follow-up of 11,861 adults from Finland. Our analyses of GlycA replicated known associations, identified associations with specific cardiovascular disease outcomes, and uncovered new associations with risk of alcoholic liver disease (meta-analysed hazard ratio 2.94 per 1-SD, P=5×10 -6 ), chronic renal failure (HR=2.47, P=3×10 -6 ), glomerular diseases (HR=1.95, P=1×10 -6 ), chronic obstructive pulmonary disease (HR=1.58, P=3×10 -5 ), inflammatory polyarthropathies (HR=1.46, P=4×10 -8 ), and hypertension (HR=1.21, P=5×10 -5 ). We further evaluated GlycA as a biomarker in secondary prevention of 12-year cardiovascular mortality in 900 angiography patients with suspected coronary artery disease. We observed hazard ratios of 4.87 and 5.00 for 12-year mortality in angiography patients in the fourth and fifth quintiles by GlycA levels demonstrating the prognostic potential of GlycA for identification of high mortality-risk in iduals. Both GlycA and C-reactive protein had shared as well as independent contributions to mortality hazard, emphasising the importance of chronic inflammation in secondary prevention of cardiovascular disease.
Publisher: Cold Spring Harbor Laboratory
Date: 24-06-2021
DOI: 10.1101/2021.06.22.21259323
Abstract: In iduals with South Asian ancestry have higher risk of heart disease than other groups in Western countries however, most genetic research has focused on European-ancestry (EUR) in iduals. It is unknown whether reported genetic loci and polygenic scores (PGSs) for cardiometabolic traits are transferable to South Asians, and whether PGSs have utility in clinical settings. Using data from 22,000 British Pakistani and Bangladeshi in iduals with linked electronic health records from the Genes & Health cohort (G& H), we conducted genome-wide association studies (GWAS) and characterised the genetic architecture of coronary artery disease (CAD), body mass index (BMI), lipid biomarkers and blood pressure. We applied a new technique to assess the extent to which loci from GWAS in EUR s les were transferable. We tested how well existing findings from EUR studies performed in genetic risk prediction and Mendelian randomisation in G& H. Trans-ancestry genetic correlations between G& H and EUR s les for the tested traits were not significantly lower than 1, except for BMI (r g =0.85, p=0.02). We found evidence for transferability for the vast majority of loci from EUR discovery studies that were sufficiently powered to replicate in G& H. PGSs showed variable transferability in G& H, with the relative accuracy compared to EUR (ratio of incremental r 2 /AUC) ≥0.95 for HDL-C, triglycerides, and blood pressure, but lower for BMI (0.78) and CAD (0.42). We observed significant improvement in categorical net reclassification in G& H (NRI=3.9% 95% CI 0.9–7.0) when adding a previously developed CAD PGS to clinical risk factors (QRISK3). We used transferable loci as genetic instruments in trans-ancestry Mendelian randomisation and found evidence of an increased CAD risk for higher LDL-C and BMI, and for lower HDL-C in G& H, consistent with our findings for EUR s les. The genetic loci for CAD and its risk factors are largely transferable from EUR studies to British Pakistanis and Bangladeshis, whereas the transferability of PGSs varies greatly between traits. Our analyses suggest clinical utility for addition of PGS to existing clinical risk prediction tools for this population. This is the first study to explore the transferability of GWAS findings and PGSs for CAD and related cardiometabolic traits in British Pakistani and Bangladeshi in iduals from a cohort with real-world electronic clinical data. We propose a new approach to assessing transferability of GWAS loci between populations, which can serve as a new methodological standard in this developing field. We find evidence of overall high transferability of GWAS loci in British Pakistanis and Bangladeshis. BMI, lipids and blood pressure show the highest transferability of loci, and CAD the lowest. The transferability of PGSs varied between traits, being high for HDL-C, triglycerides and blood pressure but more modest for CAD, BMI and LDL-C. Our results suggest that, for some traits, the use of transferable GWAS loci improves the robustness of Mendelian randomisation estimates in non-Europeans. The polygenic score for CAD derived from genetic studies of European in iduals improves reclassification on top of clinical risk factors in British Pakistanis and Bangladeshis. The improvement was driven by identification of more cases in younger in iduals (25–54 years old), and of controls in older in iduals (55–84 years old). Incorporation of the polygenic score for CAD into risk prediction models is likely to prevent cardiovascular events and deaths in this population.
Publisher: Elsevier BV
Date: 03-2021
Publisher: Cold Spring Harbor Laboratory
Date: 11-11-2021
DOI: 10.1101/2021.11.10.21266163
Abstract: To examine the previously unknown long-term association between gut microbiome composition and incident type 2 diabetes in a representative population cohort. We collected fecal s les of 5 572 Finns (mean age 48.7 years, 54.1% women) in 2002 who were followed up for incident type 2 diabetes until Dec 31 st , 2017. The s les were sequenced using shotgun metagenomics. We examined associations between gut microbiome compositions and incident diabetes using multivariable-adjusted Cox regression models. We first used the Eastern Finland sub-population to obtain initial findings and validated these in the Western Finland sub-population. Altogether 432 cases of incident diabetes occurred over the median follow-up of 15.8 years. We detected 4 species and 2 clusters consistently associated with incident diabetes in the validation models. These 4 species were Clostridium citroniae (HR, 1.21 95% CI, 1.04-1.42), C. bolteae (HR, 1.20 95% CI, 1.04-1.39), Tyzzerella nexilis (HR, 1.17 95% CI, 1.01-1.36), and Ruminococcus gnavus (HR = 1.17 95% CI, 1.01-1.36). The positively associated clusters, cluster 1 (HR, 1.18 95% CI, 1.02-1.38) and cluster 5 (HR, 1.18 95% CI, 1.02-1.36), mostly consisted of these same species. We observed robust species-level taxonomic features predictive of incident type 2 diabetes over a long-term follow-up. These findings build on and extend previous mainly cross-sectional evidence and further support links between dietary habits, metabolic diseases, and type 2 diabetes that are modulated by the gut microbiome. The gut microbiome could potentially be used to improve risk prediction and to uncover novel therapeutic targets for diabetes.
Publisher: Springer Science and Business Media LLC
Date: 24-07-2012
Abstract: Multi-locus sequence typing (MLST) has become the gold standard for population analyses of bacterial pathogens. This method focuses on the sequences of a small number of loci (usually seven) to ide the population and is simple, robust and facilitates comparison of results between laboratories and over time. Over the last decade, researchers and population health specialists have invested substantial effort in building up public MLST databases for nearly 100 different bacterial species, and these databases contain a wealth of important information linked to MLST sequence types such as time and place of isolation, host or niche, serotype and even clinical or drug resistance profiles. Recent advances in sequencing technology mean it is increasingly feasible to perform bacterial population analysis at the whole genome level. This offers massive gains in resolving power and genetic profiling compared to MLST, and will eventually replace MLST for bacterial typing and population analysis. However given the wealth of data currently available in MLST databases, it is crucial to maintain backwards compatibility with MLST schemes so that new genome analyses can be understood in their proper historical context. We present a software tool, SRST, for quick and accurate retrieval of sequence types from short read sets, using inputs easily downloaded from public databases. SRST uses read mapping and an allele assignment score incorporating sequence coverage and variability, to determine the most likely allele at each MLST locus. Analysis of over 3,500 loci in more than 500 publicly accessible Illumina read sets showed SRST to be highly accurate at allele assignment. SRST output is compatible with common analysis tools such as eBURST, Clonal Frame or PhyloViz, allowing easy comparison between novel genome data and MLST data. Alignment, fastq and pileup files can also be generated for novel alleles. SRST is a novel software tool for accurate assignment of sequence types using short read data. Several uses for the tool are demonstrated, including quality control for high-throughput sequencing projects, plasmid MLST and analysis of genomic data during outbreak investigation. SRST is open-source, requires Python, BWA and SamTools, and is available from srst.sourceforge.net .
Publisher: Cold Spring Harbor Laboratory
Date: 23-02-2020
DOI: 10.1101/2020.02.20.20025924
Abstract: Juvenile idiopathic arthritis (JIA) is an autoimmune disease and a common cause of chronic disability in children. Diagnosis of JIA is based purely on clinical symptoms, leading to treatment delays. Despite JIA having substantial heritability, the construction of genomic risk scores (GRSs) to aid or expedite diagnosis has not been assessed. Here, we generate GRSs for JIA and its subtypes and evaluate their performance. We examined three case/control cohorts (UK, US, and Australia) with genome-wide single nucleotide polymorphism (SNP) genotypes. We trained GRSs for JIA and its subtypes using lasso-penalised linear models in cross-validation on the UK cohort, and externally tested in the Australian and US cohorts. The JIA GRS alone achieved cross-validated AUC=0.670 in the UK cohort and externally validated AUCs of 0.657 and 0.671 in US-based and Australian cohorts, respectively. In logistic regression of case/control status, the corresponding odds ratios per standard deviation (s.d.) of GRS were 1.831 [1.685-1.991] and 2.008 [1.731-2.345], and were unattenuated by adjustment for sex or the top 10 genetic principal components. Extending our analysis to JIA subtypes revealed that enthesitis-related JIA had both the longest time-to-referral and the subtype GRS with the strongest predictive capacity overall across datasets: AUCs 0.80 in UK 0.83 Australian 0.69 US-based. The particularly common oligoarthritis JIA subtype also had a subtype GRS outperformed those for JIA overall, with AUCs of 0.71, 0.75 and 0.77, respectively. A genomic risk score for JIA has potential to augment purely clinical JIA diagnosis protocols, prioritising higher-risk in iduals for follow-up and treatment. Consistent with JIA heterogeneity, subtype-specific GRSs showed particularly high performance for enthesitis-related and oligoarthritis JIA.
Publisher: Wiley
Date: 30-11-2012
DOI: 10.1002/GEPI.21698
Abstract: A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease.
Publisher: BMJ
Date: 03-2022
DOI: 10.1136/BMJDRC-2021-002519
Abstract: Peptide markers of inflammation have been associated with the development of type 2 diabetes. The role of upstream, lipid-derived mediators of inflammation such as eicosanoids, remains less clear. The aim of this study was to examine whether eicosanoids are associated with incident type 2 diabetes. In the FINRISK (Finnish Cardiovascular Risk Study) 2002 study, a population-based s le of Finnish men and women aged 25–74 years, we used directed, non-targeted liquid chromatography-mass spectrometry to identify 545 eicosanoids and related oxylipins in the participants’ plasma s les (n=8292). We used multivariable-adjusted Cox regression to examine associations between eicosanoids and incident type 2 diabetes. The significant independent findings were replicated in the Framingham Heart Study (FHS, n=2886) and DIetary, Lifestyle and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) 2007 (n=3905). Together, these three cohorts had 1070 cases of incident type 2 diabetes. In the FINRISK 2002 cohort, 76 eicosanoids were associated in idually with incident type 2 diabetes. We identified three eicosanoids independently associated with incident type 2 diabetes using stepwise Cox regression with forward selection and a Bonferroni-corrected inclusion threshold. A three-eicosanoid risk score produced an HR of 1.56 (95% CI 1.41 to 1.72) per 1 SD increment for risk of incident diabetes. The HR for comparing the top quartile with the lowest was 2.80 (95% CI 2.53 to 3.07). In the replication analyses, the three-eicosanoid risk score was significant in FHS (HR 1.24 (95% CI 1.10 to 1.39, p .001)) and directionally consistent in DILGOM (HR 1.12 (95% CI 0.99 to 1.27, p=0.07)). Meta-analysis of the three cohorts yielded a pooled HR of 1.31 (95% CI 1.05 to 1.56). Plasma eicosanoid profiles predict incident type 2 diabetes and the clearest signals replicate in three independent cohorts. Our findings give new information on the biology underlying type 2 diabetes and suggest opportunities for early identification of people at risk.
Publisher: Cold Spring Harbor Laboratory
Date: 25-10-2017
DOI: 10.1101/209171
Abstract: Investigation of the genetic architecture of gene expression traits has aided interpretation of disease and trait-associated genetic variants, however key aspects of expression quantitative trait (eQTL) study design and analysis remain understudied. We used extensive, empirically-driven simulations to explore eQTL study design and the performance of various analysis strategies. Across multiple testing correction methods, false discoveries of genes with eQTLs (eGenes) were substantially inflated when false discovery rate (FDR) control was applied to all tests, and only appropriately controlled using hierarchical procedures. All multiple testing correction procedures had low power and inflated FDR for eGenes whose causal SNPs had small allele frequencies using small s le sizes (e.g. frequency % in 100 s les), indicating that even moderately low frequency eQTL SNPs (eSNPs) in these studies are enriched for false discoveries. In scenarios with ≥80% power, the top eSNP was the true simulated eSNP 90% of the time, but substantially less frequently for very common eSNPs (minor allele frequencies %). Overestimation of eQTL effect sizes, so-called “Winner’s Curse”, was common in low and moderate power settings. To address this, we developed a bootstrap method (BootstrapQTL) which led to more accurate effect size estimation. These insights provide a foundation for future eQTL studies, especially those with s ling constraints and subtly different conditions.
Publisher: Elsevier BV
Date: 08-2020
Publisher: Cold Spring Harbor Laboratory
Date: 27-09-2021
DOI: 10.1101/2021.09.24.21264079
Abstract: Metabolic biomarker data quantified by nuclear magnetic resonance (NMR) spectroscopy has recently become available in UK Biobank. Here, we describe procedures for quality control and removal of technical variation for this biomarker data, comprising 249 circulating metabolites, lipids, and lipoprotein sub-fractions on approximately 121,000 participants. We identify and characterise technical and biological factors associated with in idual biomarkers and find that linear effects on in idual biomarkers can combine in a non-linear fashion for 61 composite biomarkers and 81 biomarker ratios. We create an R package, ukbnmr, for extracting and normalising the metabolic biomarker data, then use ukbnmr to remove unwanted variation from the UK Biobank data. We make available code for re-deriving the 61 composite biomarkers and 81 ratios, and for further derivation of 76 additional biomarker ratios of potential biological significance. Finally, we demonstrate that removal of technical variation leads to increased signal for genetic and epidemiological studies of the NMR metabolic biomarkers in UK Biobank.
Publisher: Wiley
Date: 02-05-2021
Abstract: Climate change is profoundly affecting nearly all aspects of life on earth, including human societies, economies, and health. Various human activities are responsible for significant greenhouse gas (GHG) emissions, including data centers and other sources of large‐scale computation. Although many important scientific milestones are achieved thanks to the development of high‐performance computing, the resultant environmental impact is underappreciated. In this work, a methodological framework to estimate the carbon footprint of any computational task in a standardized and reliable way is presented and metrics to contextualize GHG emissions are defined. A freely available online tool, Green Algorithms ( www.green‐algorithms.org ) is developed, which enables a user to estimate and report the carbon footprint of their computation. The tool easily integrates with computational processes as it requires minimal information and does not interfere with existing code, while also accounting for a broad range of hardware configurations. Finally, the GHG emissions of algorithms used for particle physics simulations, weather forecasts, and natural language processing are quantified. Taken together, this study develops a simple generalizable framework and freely available tool to quantify the carbon footprint of nearly any computation. Combined with recommendations to minimize unnecessary CO 2 emissions, the authors hope to raise awareness and facilitate greener computation.
Publisher: Elsevier BV
Date: 10-2021
DOI: 10.1016/J.HLC.2021.04.023
Abstract: Cardiovascular diseases (CVD) are leading causes of death and morbidity in Australia and worldwide. Despite improvements in treatment, there remain large gaps in our understanding to prevent, treat and manage CVD events and associated morbidities. This article lays out a vision for enhancing CVD research in Australia through the development of a Big Data system, bringing together the multitude of rich administrative and health datasets available. The article describes the different types of Big Data available for CVD research in Australia and presents an overview of the potential benefits of a Big Data system for CVD research and some of the major challenges in establishing the system for Australia. The steps for progressing this vision are outlined.
Publisher: Springer Science and Business Media LLC
Date: 23-12-2022
DOI: 10.1038/S41598-022-26141-X
Abstract: Varying technologies and experimental approaches used in microbiome studies often lead to irreproducible results due to unwanted technical variations. Such variations, often unaccounted for and of unknown source, may interfere with true biological signals, resulting in misleading biological conclusions. In this work, we aim to characterize the major sources of technical variations in microbiome data and demonstrate how in-silico approaches can minimize their impact. We analyzed 184 pig faecal metagenomes encompassing 21 specific combinations of deliberately introduced factors of technical and biological variations. Using the novel Removing Unwanted Variations-III-Negative Binomial (RUV-III-NB), we identified several known experimental factors, specifically storage conditions and freeze–thaw cycles, as likely major sources of unwanted variation in metagenomes. We also observed that these unwanted technical variations do not affect taxa uniformly, with freezing s les affecting taxa of class Bacteroidia the most, for ex le. Additionally, we benchmarked the performances of different correction methods, including ComBat, ComBat-seq, RUVg, RUVs, and RUV-III-NB. While RUV-III-NB performed consistently robust across our sensitivity and specificity metrics, most other methods did not remove unwanted variations optimally. Our analyses suggest that a careful consideration of possible technical confounders is critical during experimental design of microbiome studies, and that the inclusion of technical replicates is necessary to efficiently remove unwanted variations computationally.
Publisher: Springer Science and Business Media LLC
Date: 13-05-2015
DOI: 10.1038/NCOMMS7833
Abstract: The avian origin A/H7N9 influenza virus causes high admission rates ( %) and mortality ( %), with ultimately favourable outcomes ranging from rapid recovery to prolonged hospitalization. Using a multicolour assay for monitoring adaptive and innate immunity, here we dissect the kinetic emergence of different effector mechanisms across the spectrum of H7N9 disease and recovery. We find that a ersity of response mechanisms contribute to resolution and survival. Patients discharged within 2–3 weeks have early prominent H7N9-specific CD8 + T-cell responses, while in iduals with prolonged hospital stays have late recruitment of CD8 + /CD4 + T cells and antibodies simultaneously (recovery by week 4), augmented even later by prominent NK cell responses (recovery days). In contrast, those who succumbed have minimal influenza-specific immunity and little evidence of T-cell activation. Our study illustrates the importance of robust CD8 + T-cell memory for protection against severe influenza disease caused by newly emerging influenza A viruses.
Publisher: Cold Spring Harbor Laboratory
Date: 05-08-2016
DOI: 10.1101/068007
Abstract: Schizophrenia and the affective disorders, here comprising bipolar disorder and major depressive disorder, are psychiatric illnesses that lead to significant morbidity and mortality worldwide. Whilst understanding of their pathobiology remains limited, large case-control studies have recently identified single nucleotide polymorphisms (SNPs) associated with these disorders. However, discerning the functional effects of these SNPs has been difficult as the associated causal genes are unknown. Here we evaluated whether schizophrenia and affective disorder associated-SNPs are correlated with gene expression within human brain tissue. Specifically, to identify expression quantitative trait loci (eQTLs), we leveraged disorder-associated SNPs identified from six Psychiatric Genomics Consortium and CONVERGE Consortium studies with gene expression levels in post-mortem, neurologically-normal tissue from two independent human brain tissue expression datasets (UK Brain Expression Consortium (UKBEC) and Genotype-Tissue Expression (GTEx)). We identified 6 188 and 16 720 cis-acting SNPs exceeding genome-wide significance (p x10 −8 ) in the UKBEC and GTEx datasets, respectively. 1 288 cis-eQTLs were significant in a metaanalysis leveraging overlapping brain regions and were associated with expression of 15 genes, including three non-coding RNAs. One cis-eQTL, rs 16969968, results in a functionally disruptive missense mutation in CHRNA5 , a schizophrenia-implicated gene. Meta-analysis identified 297 trans -eQTLs associated with 24 genes that were significant in a region-specific manner. Importantly, comparing across tissues, we find that blood eQTLs largely do not capture brain cis-eQTLs. This study identifies putatively causal genes whose expression in region-specific brain tissue may contribute to the risk of schizophrenia and affective disorders.
Publisher: Elsevier BV
Date: 05-2008
Publisher: Springer Science and Business Media LLC
Date: 29-01-2012
DOI: 10.1038/NG.1073
Publisher: Public Library of Science (PLoS)
Date: 09-04-2014
Publisher: Elsevier BV
Date: 06-2022
DOI: 10.1016/J.AHJ.2022.02.007
Abstract: The traditional primary prevention paradigm for coronary artery disease (CAD) centers on population-based algorithms to classify in idual risk. However, this approach often misclassifies in iduals and leaves many in the 'intermediate' category, for whom there is no clear preferred prevention strategy. Coronary artery calcium (CAC) and polygenic risk scoring (PRS) are 2 contemporary tools for risk prediction to enhance the impact of effective management. To determine how these CAC and PRS impact adherence to pharmacotherapy and lifestyle measures in asymptomatic in iduals with subclinical atherosclerosis. The CAPAR-CAD study is a multicenter, open, randomized controlled trial in Victoria, Australia. Participants are self-selected in iduals aged 40 to 70 years with no prior history of cardiovascular disease (CVD), intermediate 10-year risk for CAD as determined by the pooled cohort equation (PCE), and CAC scores >0. All participants will have a health assessment, a full CT coronary angiogram (CTCA), and PRS calculation. They will then be randomized to receive their risk presented either as PCE and CAC, or PCE and PRS. The intervention includes e-Health coaching focused on risk factor management, health education and pharmacotherapy, and follow-up to augment adherence to a statin medication. The primary endpoint is a change in low-density lipoprotein cholesterol (LDL-C) from baseline to 12 months. The secondary endpoint is between-group differences in behavior modification and adherence to statin pharmacotherapy. As of July 31, 2021, we have screened 1,903 in iduals. We present the results of the 574 participants deemed eligible after baseline assessment.
Publisher: Cold Spring Harbor Laboratory
Date: 22-10-2022
DOI: 10.1101/2022.10.20.22281120
Abstract: To provide quantitative evidence of the use of polygenic risk scores (PRS) for systematically identifying in iduals for invitation for full formal cardiovascular disease (CVD) risk assessment. 108,685 participants aged 40-69, with measured biomarkers, linked primary care records and genetic data in UK Biobank were used for model derivation and population health modelling. Prioritisation tools using age, PRS for coronary artery disease and stroke, and conventional risk factors for CVD available within longitudinal primary care records were derived using sex-specific Cox models. Rescaling to account for the healthy cohort effect, we modelled the implications of initiating guideline-recommended statin therapy after prioritising in iduals for invitation to a formal CVD risk assessment. 1,838 CVD events were observed over median follow up of 8.2 years. If primary care records were used to prioritise in iduals for formal risk assessment using age- and sex-specific thresholds corresponding to 5% false negative rates then we would capture 65% and 43% events amongst men and women respectively. The numbers of men and women needed to be screened to prevent one CVD event (NNS) are 74 and 140 respectively. In contrast, adding PRS to both prioritisation and formal assessments, and selecting thresholds to capture the same number of events resulted in a NNS of 60 for men and 90 for women. The use of PRS together with primary care records to prioritise in iduals at highest risk of a CVD event for a formal CVD risk assessment can more efficiently prioritise those who need interventions the most than using primary care records alone. This could lead to better allocation of resources by reducing the number of formal risk assessments in primary care while still preventing the same number CVD events.
Publisher: Springer Science and Business Media LLC
Date: 30-03-2022
Publisher: Springer Science and Business Media LLC
Date: 31-01-2023
DOI: 10.1038/S41597-023-01949-Y
Abstract: Metabolic biomarker data quantified by nuclear magnetic resonance (NMR) spectroscopy in approximately 121,000 UK Biobank participants has recently been released as a community resource, comprising absolute concentrations and ratios of 249 circulating metabolites, lipids, and lipoprotein sub-fractions. Here we identify and characterise additional sources of unwanted technical variation influencing in idual biomarkers in the data available to download from UK Biobank. These included s le preparation time, shipping plate well, spectrometer batch effects, drift over time within spectrometer, and outlier shipping plates. We developed a procedure for removing this unwanted technical variation, and demonstrate that it increases signal for genetic and epidemiological studies of the NMR metabolic biomarker data in UK Biobank. We subsequently developed an R package, ukbnmr, which we make available to the wider research community to enhance the utility of the UK Biobank NMR metabolic biomarker data and to facilitate rapid analysis.
Publisher: Elsevier BV
Date: 04-2023
Publisher: Springer Science and Business Media LLC
Date: 11-05-2021
DOI: 10.1038/S41467-021-22962-Y
Abstract: The collection of fecal material and developments in sequencing technologies have enabled standardised and non-invasive gut microbiome profiling. Microbiome composition from several large cohorts have been cross-sectionally linked to various lifestyle factors and diseases. In spite of these advances, prospective associations between microbiome composition and health have remained uncharacterised due to the lack of sufficiently large and representative population cohorts with comprehensive follow-up data. Here, we analyse the long-term association between gut microbiome variation and mortality in a well-phenotyped and representative population cohort from Finland ( n = 7211). We report robust taxonomic and functional microbiome signatures related to the Enterobacteriaceae family that are associated with mortality risk during a 15-year follow-up. Our results extend previous cross-sectional studies, and help to establish the basis for examining long-term associations between human gut microbiome composition, incident outcomes, and general health status.
Publisher: Elsevier BV
Date: 05-2021
Publisher: Cold Spring Harbor Laboratory
Date: 16-09-2021
DOI: 10.1101/2021.09.14.458035
Abstract: There remains a clinical need for better approaches to rapid drug susceptibility testing in view of the increasing burden of multidrug resistant tuberculosis. Binary susceptibility phenotypes only capture changes in minimum inhibitory concentration when these cross the critical concentration, even though other changes may be clinically relevant. We developed a machine learning system to predict minimum inhibitory concentration from unassembled whole-genome sequencing data for 13 anti-tuberculosis drugs. We trained, validated and tested the system on 10,859 isolates from the CRyPTIC dataset. Essential agreement rates (predicted MIC within one doubling dilution of observed MIC) were above 92% for first-line drugs, 91% for fluoroquinolones and aminoglycosides, and 90% for new and repurposed drugs, albeit with a significant drop in performance for the very few phenotypically resistant isolates in the latter group. To further validate the model in the absence of external MIC datasets, we predicted MIC and converted values to binary for an external set of 15,239 isolates with binary phenotypes, and compare their performance against a previously validated mutation catalogue, the expected performance of existing molecular assays, and World Health Organization Target Product Profiles. The sensitivity of the model on the external dataset was greater than 90% for all drugs except ethionamide, clofazimine and linezolid. Specificity was greater than 95% for all drugs except ethambutol, ethionamide, bedaquiline, delamanid and clofazimine. The proposed system can provide quantitative susceptibility phenotyping to help guide antimicrobial therapy, although further data collection and validation are required before machine learning can be used clinically for all drugs.
Publisher: Oxford University Press (OUP)
Date: 09-11-2022
DOI: 10.1093/NAR/GKAC1010
Abstract: The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to & 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for & 000 published GWAS across & human traits, and & 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population ersity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 06-11-2017
Abstract: Cardiac hypertrophy increases the risk of developing heart failure and cardiovascular death. The neutrophil inflammatory protein, lipocalin‐2 ( LCN 2/ NGAL ), is elevated in certain forms of cardiac hypertrophy and acute heart failure. However, a specific role for LCN 2 in predisposition and etiology of hypertrophy and the relevant genetic determinants are unclear. Here, we defined the role of LCN 2 in concentric cardiac hypertrophy in terms of pathophysiology, inflammatory expression networks, and genomic determinants. We used 3 experimental models: a polygenic model of cardiac hypertrophy and heart failure, a model of intrauterine growth restriction and Lcn2 ‐knockout mouse cultured cardiomyocytes and 2 human cohorts: 114 type 2 diabetes mellitus patients and 2064 healthy subjects of the YFS (Young Finns Study). In hypertrophic heart rats, cardiac and circulating Lcn2 was significantly overexpressed before, during, and after development of cardiac hypertrophy and heart failure. Lcn2 expression was increased in hypertrophic hearts in a model of intrauterine growth restriction, whereas Lcn2 ‐knockout mice had smaller hearts. In cultured cardiomyocytes, Lcn2 activated molecular hypertrophic pathways and increased cell size, but reduced proliferation and cell numbers. Increased LCN 2 was associated with cardiac hypertrophy and diastolic dysfunction in diabetes mellitus. In the YFS , LCN 2 expression was associated with body mass index and cardiac mass and with levels of inflammatory markers. The single‐nucleotide polymorphism, rs13297295, located near LCN 2 defined a significant cis ‐ eQTL for LCN 2 expression. Direct effects of LCN 2 on cardiomyocyte size and number and the consistent associations in experimental and human analyses reveal a central role for LCN 2 in the ontogeny of cardiac hypertrophy and heart failure.
Publisher: Cold Spring Harbor Laboratory
Date: 22-05-2021
DOI: 10.1101/2021.05.21.445058
Abstract: Varying technologies and experimental approaches used in microbiome studies often lead to irreproducible results due to unwanted technical variations. Such variations, often unaccounted for and of unknown source, may interfere with true biological signals, resulting in misleading biological conclusions. In this work, we aim to characterize the major sources of technical variations in microbiome data and demonstrate how a state-of-the art approach can minimize their impact on downstream analyses. We analyzed 184 pig faecal metagenomes encompassing 21 specific combinations of deliberately introduced factors of technical and biological variations. We identify several known experimental factors, specifically storage conditions and freeze-thaw cycles, as a likely major source of unwanted variation in metagenomes. We also observed that these unwanted technical variations do not affect taxa uniformly, with freezing s les affecting taxa of class Bacteroidia the most, for ex le. Additionally, we benchmarked the performance of a novel batch correcting tool used in this study, RUV-III-NB ( imfuxing/ruvIIInb/ ), to other popular batch correction methods, including ComBat, ComBat-seq, RUVg, and RUVs. While RUV-III-NB performed consistently robustly across our sensitivity and specificity metrics, most other methods did not remove unwanted variations optimally, with RUVg even overcorrecting and removing some of the true biological signals from the s les. Our analyses suggests that a careful consideration of possible technical confounders is critical in the experimental design of microbiome studies to ensure accurate biological reading of microbial taxa of interest, and that the inclusion of technical replicates is necessary to efficiently remove unwanted variations computationally.
Publisher: Public Library of Science (PLoS)
Date: 16-08-2012
Publisher: Public Library of Science (PLoS)
Date: 03-04-2009
Publisher: Elsevier BV
Date: 08-2015
DOI: 10.1016/J.AUTREV.2015.04.005
Abstract: There is a pressing need to reduce the high global disease burden of rheumatic heart disease (RHD) and its harbinger, acute rheumatic fever (ARF). ARF is a classical ex le of an autoimmune syndrome and is of particular immunological interest because it follows a known antecedent infection with group A streptococcus (GAS). However, the poorly understood immunopathology of these post-infectious diseases means that, compared to much progress in other immune-mediated diseases, we still lack useful biomarkers, new therapies or an effective vaccine in ARF and RHD. Here, we summarise recent literature on the complex interaction between GAS and the human host that culminates in ARF and the subsequent development of RHD. We contrast ARF with other post-infectious streptococcal immune syndromes - post-streptococcal glomerulonephritis (PSGN) and the still controversial paediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (PANDAS), in order to highlight the potential significance of variations in the host immune response to GAS. We discuss a model for the pathogenesis of ARF and RHD in terms of current immunological concepts and the potential for application of in depth "omics" technologies to these ancient scourges.
Publisher: Springer Science and Business Media LLC
Date: 27-04-2008
DOI: 10.1038/NG.145
Publisher: Oxford University Press (OUP)
Date: 26-09-2017
Abstract: Rheumatic heart disease (RHD) after group A streptococcus (GAS) infections is heritable and prevalent in Indigenous populations. Molecular mimicry between human and GAS proteins triggers proinflammatory cardiac valve-reactive T cells. Genome-wide genetic analysis was undertaken in 1263 Aboriginal Australians (398 RHD cases 865 controls). Single-nucleotide polymorphisms were genotyped using Illumina HumanCoreExome BeadChips. Direct typing and imputation was used to fine-map the human leukocyte antigen (HLA) region. Epitope binding affinities were mapped for human cross-reactive GAS proteins, including M5 and M6. The strongest genetic association was intronic to HLA-DQA1 (rs9272622 P = 1.86 × 10-7). Conditional analyses showed rs9272622 and/or DQA1*AA16 account for the HLA signal. HLA-DQA1*0101_DQB1*0503 (odds ratio [OR], 1.44 95% confidence interval [CI], 1.09-1.90 P = 9.56 × 10-3) and HLA-DQA1*0103_DQB1*0601 (OR, 1.27 95% CI, 1.07-1.52 P = 7.15 × 10-3) were risk haplotypes HLA_DQA1*0301-DQB1*0402 (OR 0.30, 95%CI 0.14-0.65, P = 2.36 × 10-3) was protective. Human myosin cross-reactive N-terminal and B repeat epitopes of GAS M5/M6 bind with higher affinity to DQA1/DQB1 alpha/beta dimers for the 2-risk haplotypes than the protective haplotype. Variation at HLA_DQA1-DQB1 is the major genetic risk factor for RHD in Aboriginal Australians studied here. Cross-reactive epitopes bind with higher affinity to alpha/beta dimers formed by risk haplotypes, supporting molecular mimicry as the key mechanism of RHD pathogenesis.
Publisher: Cold Spring Harbor Laboratory
Date: 12-10-2023
Publisher: Cold Spring Harbor Laboratory
Date: 20-12-2017
DOI: 10.1101/237073
Abstract: Events in early life contribute to subsequent risk of asthma however, the causes and trajectories of childhood wheeze are heterogeneous and do not always result in asthma. Similarly, not all atopic in iduals develop wheeze, and vice versa. The reasons for these differences are unclear. Using unsupervised model-based cluster analysis, we identified latent clusters within a prospective birth cohort with deep immunological and respiratory phenotyping. We characterised each cluster in terms of immunological profile and disease risk, and replicated our results in external cohorts from the UK and USA. We discovered three distinct trajectories, one of which is a high-risk “atopic” cluster with increased propensity for allergic diseases throughout childhood. Atopy contributes varyingly to later wheeze depending on cluster membership. Our findings demonstrate the utility of unsupervised analysis in elucidating heterogeneity in asthma pathogenesis and provide a foundation for improving management and prevention of childhood asthma.
Publisher: Springer Science and Business Media LLC
Date: 04-05-2008
DOI: 10.1038/NG.140
Publisher: Cold Spring Harbor Laboratory
Date: 25-10-2022
DOI: 10.1101/2022.10.23.22281420
Abstract: The analysis of longitudinal data from electronic health records (EHR) has potential to improve clinical diagnoses and enable personalised medicine, motivating efforts to identify disease subtypes from age-dependent patient comorbidity information. Here, we introduce an age-dependent topic modelling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR data sets. The model learns, and assigns to each in idual, topic weights for several disease topics, each of which reflects a set of diseases that tend to co-occur within in iduals as a function of age. Simulations show that ATM attains high accuracy in distinguishing distinct age-dependent comorbidity profiles. We applied ATM to 282,957 UK Biobank s les, analysing 1,726,144 disease diagnoses spanning all 348 diseases with ≥1,000 independent occurrences in the Hospital Episode Statistics (HES) data, identifying 10 disease topics under the optimal model fit. Analysis of an independent cohort, All of Us, with 211,908 s les and 3,098,771 disease diagnoses spanning 233 of the 348 UK Biobank diseases produced highly concordant findings. In UK Biobank we identified 52 diseases with heterogeneous comorbidity profiles (≥500 occurrences assigned to each of ≥2 topics), including breast cancer, type 2 diabetes (T2D), hypertension, and hypercholesterolemia. For most of these diseases, topic assignments were highly age-dependent, suggesting differences in disease aetiology for early-onset vs. late-onset disease. We defined subtypes of the 52 heterogeneous diseases based on the topic assignments, and compared genetic risk across subtypes using polygenic risk scores (PRS). We identified 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease, including a subtype of T2D characterised by cardiovascular comorbidities and a subtype of asthma characterised by dermatological comorbidities. We further identified specific variants underlying these differences such as a T2D-associated SNP in the HMGA2 locus that has a higher odds ratio in the top quartile of cardiovascular topic weight (1.18±0.02) compared to the bottom quartile (1.00±0.02) (P=3 × 10 - 7 for difference, FDR = 0.0002 0.1). In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles.
Publisher: Springer Science and Business Media LLC
Date: 10-03-2021
Publisher: Informa UK Limited
Date: 2021
Publisher: Springer Science and Business Media LLC
Date: 07-03-2019
DOI: 10.1038/S41598-019-40490-0
Abstract: Active breaks in prolonged sitting has beneficial impacts on cardiometabolic risk biomarkers. The molecular mechanisms include regulation of skeletal muscle gene and protein expression controlling metabolic, inflammatory and cell development pathways. An active communication network exists between adipose and muscle tissue, but the effect of active breaks in prolonged sitting on adipose tissue have not been investigated. This study characterized the acute transcriptional events induced in adipose tissue by regular active breaks during prolonged sitting. We studied 8 overweight/obese adults participating in an acute randomized three-intervention crossover trial. Interventions were performed in the postprandial state and included: (i) prolonged uninterrupted sitting or prolonged sitting interrupted with 2-minute bouts of (ii) light- or (iii) moderate-intensity treadmill walking every 20 minutes. Subcutaneous adipose tissue biopsies were obtained after each condition. Microarrays identified 36 differentially expressed genes between the three conditions (fold change ≥0.5 in either direction p 0.05). Pathway analysis indicated that breaking up of prolonged sitting led to differential regulation of adipose tissue metabolic networks and inflammatory pathways, increased insulin signaling, modulation of adipocyte cell cycle, and facilitated cross-talk between adipose tissue and other organs. This study provides preliminary insight into the adipose tissue regulatory systems that may contribute to the physiological effects of interrupting prolonged sitting.
Publisher: Springer Science and Business Media LLC
Date: 04-11-2016
Publisher: Public Library of Science (PLoS)
Date: 10-03-2017
Publisher: Cold Spring Harbor Laboratory
Date: 31-03-2022
DOI: 10.1101/2022.03.25.22272958
Abstract: Understanding how genetic variants influence disease risk and complex traits (variant-to-function) is one of the major challenges in human genetics. Here we present a model-driven framework to leverage human genome-scale metabolic networks to define how genetic variants affect biochemical reaction fluxes across major human tissues, including skeletal muscle, adipose, liver, brain and heart. As proof of concept, we build personalised organ-specific metabolic flux models for 524,615 in iduals of the INTERVAL and UK Biobank cohorts and perform a fluxome-wide association study (FWAS) to identify 4,411 associations between personalised flux values and the concentration of metabolites in blood. Furthermore, we apply FWAS to identify 97 metabolic fluxes associated with the risk of developing coronary artery disease, many of which are linked to processes previously described to play in role in the disease. Our work demonstrates that genetically personalised metabolic models can elucidate the downstream effects of genetic variants on biochemical reactions involved in common human diseases.
Publisher: Public Library of Science (PLoS)
Date: 31-07-2014
Publisher: Elsevier BV
Date: 08-2021
DOI: 10.1093/AJCN/NQAB077
Publisher: Springer Science and Business Media LLC
Date: 21-05-2018
Publisher: Cold Spring Harbor Laboratory
Date: 02-01-2020
DOI: 10.1101/2019.12.30.19015842
Abstract: The collection of fecal material and developments in sequencing technologies have enabled cost-efficient, standardized, and non-invasive gut microbiome profiling. As a result, microbiome composition data from several large cohorts have been cross-sectionally linked to various lifestyle factors and diseases. 1–5 In spite of these advances, prospective associations between microbiome composition and health have remained uncharacterized due to the lack of sufficiently large and representative population cohorts with comprehensive follow-up data. 6–8 Here, we analyse the long-term association between gut microbiome variation and mortality in a large, well-phenotyped, and representative population cohort ( n = 7211, FINRISK 2002 Finland). 9 We report specific taxonomic and functional signatures related to the Enterobacteriaceae family in the human gut microbiome that predict mortality during a 15-year follow-up. These associations can be observed both in the Eastern and Western Finns who have differing genetic backgrounds, lifestyles, and mortality rates. 10,11 Our results supplement previously reported cross-sectional associations, 1–4,12 and help to establish a methodological and conceptual basis for examining long-term associations between human gut microbiome composition, incident outcomes, and general health status. These findings could serve as a solid framework for microbiome profiling in clinical risk prediction, paving the way towards clinical applications of human microbiome sequencing aimed at prediction, prevention, and treatment of disease.
Publisher: Public Library of Science (PLoS)
Date: 13-02-2014
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 11-2018
DOI: 10.1161/CIRCGEN.118.002234
Abstract: Integration of systems-level biomolecular information with electronic health records has led to recent interest in the glycoprotein acetyls (GlycA) biomarker—a serum- or plasma-derived nuclear magnetic resonance spectroscopy signal that represents the abundance of circulating glycated proteins. GlycA predicts risk of erse outcomes, including cardiovascular disease, type 2 diabetes mellitus, and all-cause mortality however, the underlying detailed associations of GlycA’s morbidity and mortality risk are currently unknown. We used 2 population-based cohorts totaling 11 861 adults from the Finnish general population to test for an association with 468 common incident hospitalization and mortality outcomes during an 8-year follow-up. Further, we utilized 900 angiography patients to test for GlycA association with mortality risk and potential utility for mortality risk discrimination during 12-year follow-up. New associations with GlycA and incident alcoholic liver disease, chronic renal failure, glomerular diseases, chronic obstructive pulmonary disease, inflammatory polyarthropathies, and hypertension were uncovered, and known incident disease associations were replicated. GlycA associations for incident disease outcomes were in general not attenuated when adjusting for hsCRP (high-sensitivity C-reactive protein). Among 900 patients referred to angiography, GlycA had hazard ratios of 4.87 (95% CI, 2.45–9.65) and 5.00 (95% CI, 2.38–10.48) for 12-year risk of mortality in the fourth and fifth quintiles by GlycA levels, demonstrating its prognostic potential for identification of high-risk in iduals. When modeled together, both hsCRP and GlycA were attenuated but remained significant. GlycA was predictive of myriad incident diseases across many major internal organs and stratified mortality risk in angiography patients. Both GlycA and hsCRP had shared and independent contributions to mortality risk, suggesting chronic inflammation as an etiological factor. GlycA may be useful in improving risk prediction in specific disease settings.
Publisher: Cold Spring Harbor Laboratory
Date: 26-03-2021
DOI: 10.1101/2021.03.23.21254144
Abstract: The use of a polygenic risk score (PRS) to predict coronary heart disease (CHD) events has been demonstrated in the general adult population. However, whether predictive performance extends to older in iduals is unclear. To evaluate the predictive value of a PRS for incident CHD events in a prospective cohort of in iduals aged 70 years and older. We used data from 12,792 genotyped participants of the ASPREE trial, a randomized placebo-controlled trial investigating the effect of daily 100mg aspirin on disability-free survival in healthy older people. Participants had no previous history of diagnosed atherothrombotic cardiovascular events, dementia, or persistent physical disability at enrolment. We calculated a PRS comprising 1.7 million genetic variants (metaGRS). The primary outcome was a composite of incident myocardial infarction or CHD death over 5 years. At baseline, the median population age was 73.9 years and 54.9% were female. In total, 254 incident CHD events occurred. When the PRS was added to conventional risk factors, it was independently associated with CHD (hazard ratio 1.24 [95% confidence interval [CI] 1.08-1.42], p=0.002). The AUC of the conventional model was 70.53 (95%CI 67.00-74.06), and after inclusion of the PRS increased to 71.78 (95%CI 68.32-75.24, p=0.019), demonstrating improved prediction. Reclassification was also improved, as the continuous net reclassification index after adding PRS to the conventional model was 0.25 (95%CI 0.15-0.28). A PRS for CHD performs well in older people, suggesting that the clinical utility of genomic risk prediction for CHD extends to this distinct high-risk subgroup.
Publisher: Cold Spring Harbor Laboratory
Date: 18-08-2020
DOI: 10.1101/2020.08.17.238444
Abstract: The human microbiota has a close relationship with human disease and it remodels components of the glycocalyx including heparan sulfate (HS). Studies of the severe acute respiratory syndrome coronavirus (SARS-CoV-2) spike protein receptor binding domain suggest that infection requires binding to HS and angiotensin converting enzyme 2 (ACE2) in a codependent manner. Here, we show that commensal host bacterial communities can modify HS and thereby modulate SARS-CoV-2 spike protein binding and that these communities change with host age and sex. Common human-associated commensal bacteria whose genomes encode HS-modifying enzymes were identified. The prevalence of these bacteria and the expression of key microbial glycosidases in bronchoalveolar lavage fluid (BALF) was lower in adult COVID-19 patients than in healthy controls. The presence of HS-modifying bacteria decreased with age in two large survey datasets, FINRISK 2002 and American Gut, revealing one possible mechanism for the observed increase in COVID-19 susceptibility with age. In vitro , bacterial glycosidases from unpurified culture media supernatants fully blocked SARS-CoV-2 spike binding to human H1299 protein lung adenocarcinoma cells. HS-modifying bacteria in human microbial communities may regulate viral adhesion, and loss of these commensals could predispose in iduals to infection. Understanding the impact of shifts in microbial community composition and bacterial lyases on SARS-CoV-2 infection may lead to new therapeutics and diagnosis of susceptibility. It is well known that host microbes groom the mucosa where they reside. Recent investigations have shown that HS, a major component of mucosal layers, is necessary for SARS-CoV-2 infection. In this study we examine the impact of microbial modification of HS on viral attachment.
Publisher: Public Library of Science (PLoS)
Date: 11-02-2019
Publisher: Springer Science and Business Media LLC
Date: 20-04-2015
DOI: 10.1038/NI.3154
Abstract: When B cells encounter an antigen, they alter their physiological state and anatomical localization and initiate a differentiation process that ultimately produces antibody-secreting cells (ASCs). We have defined the transcriptomes of many mature B cell populations and stages of plasma cell differentiation in mice. We provide a molecular signature of ASCs that highlights the stark transcriptional ide between B cells and plasma cells and enables the demarcation of ASCs on the basis of location and maturity. Changes in gene expression correlated with cell- ision history and the acquisition of permissive histone modifications, and they included many regulators that had not been previously implicated in B cell differentiation. These findings both highlight and expand the core program that guides B cell terminal differentiation and the production of antibodies.
Publisher: Springer Science and Business Media LLC
Date: 21-11-2011
Publisher: eLife Sciences Publications, Ltd
Date: 02-08-2018
Publisher: MDPI AG
Date: 30-09-2022
Abstract: Weight loss and increased physical activity may promote beneficial modulation of the metabolome, but limited evidence exists about how very low-level weight loss affects the metabolome in previously non-obese active in iduals. Following a weight loss period (21.1 ± 3.1 weeks) leading to substantial fat mass loss of 52% (−7.9 ± 1.5 kg) and low body fat (12.7 ± 4.1%), the liquid chromatography-mass spectrometry-based metabolic signature of 24 previously young, healthy, and normal weight female physique athletes was investigated. We observed uniform increases (FDR 0.05) in bile acids, very-long-chain free fatty acids (FFA), and oxylipins, together with reductions in unsaturated FFAs after weight loss. These widespread changes, especially in the bile acid profile, were most strongly explained (FDR 0.05) by changes in android (visceral) fat mass. The reported changes did not persist, as all of them were reversed after the subsequent voluntary weight regain period (18.4 ± 2.9 weeks) and were unchanged in non-dieting controls (n = 16). Overall, we suggest that the reported changes in FFA, bile acid, and oxylipin profiles reflect metabolic adaptation to very low levels of fat mass after prolonged periods of intense exercise and low-energy availability. However, the effects of the aforementioned metabolome subclass alteration on metabolic homeostasis remain controversial, and more studies are warranted to unravel the complex physiology and potentially associated health implications. In the end, our study reinforced the view that transient weight loss seems to have little to no long-lasting molecular and physiological effects.
Publisher: Cold Spring Harbor Laboratory
Date: 04-08-2020
DOI: 10.1101/2020.08.01.20166413
Abstract: Bioactive metabolites are central to numerous pathways and disease pathophysiology, yet many bioactive metabolites are still uncharacterized. Here, we quantified bioactive metabolites using untargeted LC-MS plasma metabolomics in two large cohorts (combined N≈9,300) and utilized genome-wide association analysis and Mendelian randomization to uncover genetic loci with roles in bioactive metabolism and prioritize metabolite features for more in-depth characterization. We identified 118 loci associated with levels of 2,319 distinct metabolite features which replicated across cohorts and reached study-wide significance in meta-analysis. Of these loci, 39 were previously not known to be associated with blood metabolites. Loci harboring SLCO1B1 and UGT1A were highly pleiotropic, accounting for % of all associations. Two-s le Mendelian randomization found 46 causal effects of 31 metabolite features on at least one of five common diseases. Of these, 15, including leukotriene D4, had protective effects on both coronary heart disease and primary sclerosing cholangitis. We further assessed the association between baseline metabolite features and incident coronary heart disease using 16 years of follow-up health records. This study characterizes the genetic landscape of bioactive metabolite features and their putative causal effects on disease.
Publisher: Oxford University Press (OUP)
Date: 05-05-2017
DOI: 10.1093/BIOINFORMATICS/BTX299
Abstract: Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of in iduals, standard approaches are no longer feasible. However, when the full decomposition is not required, substantial computational savings can be made. We present FlashPCA2, a tool that can perform partial PCA on 1 million in iduals faster than competing approaches, while requiring substantially less memory. abraham/flashpca. Supplementary data are available at Bioinformatics online.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 02-2022
DOI: 10.1161/CIRCGEN.121.003429
Abstract: The use of a polygenic risk score (PRS) to improve risk prediction of coronary heart disease (CHD) events has been demonstrated to have clinical utility in the general adult population. However, the prognostic value of a PRS for CHD has not been examined specifically in older populations of in iduals aged ≥70 years, who comprise a distinct high-risk subgroup. The objective of this study was to evaluate the predictive value of a PRS for incident CHD events in a prospective cohort of older in iduals without a history of cardiovascular events. We used data from 12 792 genotyped, healthy older in iduals enrolled into the ASPREE trial (Aspirin in Reducing Events in the Elderly), a randomized double-blind placebo-controlled clinical trial investigating the effect of daily 100 mg aspirin on disability-free survival. Participants had no previous history of diagnosed atherothrombotic cardiovascular events, dementia, or persistent physical disability at enrollment. We calculated a PRS (meta-genomic risk score) consisting of 1.7 million genetic variants. The primary outcome was a composite of incident myocardial infarction or CHD death over 5 years. At baseline, the median population age was 73.9 years, and 54.9% were female. In total, 254 incident CHD events occurred. When the PRS was added to conventional risk factors, it was independently associated with CHD (hazard ratio, 1.24 [95% CI, 1.08–1.42], P =0.002). The area under the curve of the conventional model was 70.53 (95% CI, 67.00–74.06), and after inclusion of the PRS increased to 71.78 (95% CI, 68.32–75.24, P =0.019), demonstrating improved prediction. Reclassification was also improved, as the continuous net reclassification index after adding PRS to the conventional model was 0.25 (95% CI, 0.15–0.28). A PRS for CHD performs well in older people and improves prediction over conventional cardiovascular risk factors. Our study provides evidence that genomic risk prediction for CHD has clinical utility in in iduals aged 70 years and older. URL: www.clinicaltrials.gov Unique identifier: NCT01038583
Publisher: eLife Sciences Publications, Ltd
Date: 15-10-2018
DOI: 10.7554/ELIFE.35856
Abstract: Events in early life contribute to subsequent risk of asthma however, the causes and trajectories of childhood wheeze are heterogeneous and do not always result in asthma. Similarly, not all atopic in iduals develop wheeze, and vice versa. The reasons for these differences are unclear. Using unsupervised model-based cluster analysis, we identified latent clusters within a prospective birth cohort with deep immunological and respiratory phenotyping. We characterised each cluster in terms of immunological profile and disease risk, and replicated our results in external cohorts from the UK and USA. We discovered three distinct trajectories, one of which is a high-risk ‘atopic’ cluster with increased propensity for allergic diseases throughout childhood. Atopy contributes varyingly to later wheeze depending on cluster membership. Our findings demonstrate the utility of unsupervised analysis in elucidating heterogeneity in asthma pathogenesis and provide a foundation for improving management and prevention of childhood asthma.
Publisher: Cold Spring Harbor Laboratory
Date: 19-01-2018
DOI: 10.1101/250712
Abstract: Coronary artery disease (CAD) has substantial heritability and a polygenic architecture however, genomic risk scores have not yet leveraged the totality of genetic information available nor been externally tested at population-scale to show potential utility in primary prevention. Using a meta-analytic approach to combine large-scale genome-wide and targeted genetic association data, we developed a new genomic risk score for CAD (metaGRS), consisting of 1.7 million genetic variants. We externally tested metaGRS, in idually and in combination with available conventional risk factors, in 22,242 CAD cases and 460,387 non-cases from UK Biobank. In UK Biobank, a standard deviation increase in metaGRS had a hazard ratio (HR) of 1.71 (95% CI 1.68–1.73) for CAD, greater than any other externally tested genetic risk score. In iduals in the top 20% of the metaGRS distribution had a HR of 4.17 (95% CI 3.97–4.38) compared with those in the bottom 20%. The metaGRS had higher C-index (C=0.623, 95% CI 0.615–0.631) for incident CAD than any of four conventional factors (smoking, diabetes, hypertension, and body mass index), and addition of the metaGRS to a model of conventional risk factors increased C-index by 3.7%. In in iduals on lipid-lowering or anti-hypertensive medications at recruitment, metaGRS hazard for incident CAD was significantly but only partially attenuated with HR of 2.83 (95% CI 2.61– 3.07) between the top and bottom 20% of the metaGRS distribution. Recent genetic association studies have yielded enough information to meaningfully stratify in iduals using the metaGRS for CAD risk in both early and later life, thus enabling targeted primary intervention in combination with conventional risk factors. The metaGRS effect was partially attenuated by lipid and blood pressure-lowering medication, however other prevention strategies will be required to fully benefit from earlier genomic risk stratification. National Health and Medical Research Council of Australia, British Heart Foundation, Australian Heart Foundation.
Publisher: Cold Spring Harbor Laboratory
Date: 19-12-2019
DOI: 10.1101/2019.12.14.876474
Abstract: Common human diseases are frequently polygenic in architecture, comprising a large number of risk alleles with small effects spread across the genome 1–3 . Polygenic scores (PGSs) aggregate these alleles into a metric which represents an in idual’s genetic predisposition to a specific disease. PGSs have shown promise for early risk prediction 4–7 , and there is potential to use PGSs to understand disease biology in parallel 8 . Here, we investigate the role plasma protein levels play in cardiometabolic disease risk in a cohort of 3,087 healthy in iduals using PGSs. We found PGSs for coronary artery disease (CAD), type 2 diabetes (T2D), chronic kidney disease (CKD), and ischaemic stroke (IS) were associated with levels of 49 plasma proteins. These associations were polygenic in architecture, largely independent of cis protein QTLs, and robust to environmental variation. Over a median 7.7 years follow-up, 28 of these plasma proteins were associated with future myocardial infarction (MI) or T2D events, 16 of which were causal mediators between polygenic risk and incident disease. These protein mediators of polygenic disease risk included targets of approved therapies which may have repurposing potential. Our results demonstrate that PGSs can identify proteins with causal roles in disease, and may have utility in drug development.
Publisher: Elsevier BV
Date: 10-2018
Publisher: Oxford University Press (OUP)
Date: 04-2015
Publisher: Cold Spring Harbor Laboratory
Date: 03-03-2018
DOI: 10.1101/272583
Abstract: A common goal of microbiome studies is the elucidation of community composition and member interactions using counts of taxonomic units extracted from sequence data. Inference of interaction networks from sparse and compositional data requires specialised statistical approaches. A popular solution is SparCC, however its performance limits the calculation of interaction networks for very high-dimensional datasets. Here we introduce FastSpar, an efficient and parallelisable implementation of the SparCC algorithm which rapidly infers correlation networks and calculates p -values using an unbiased estimator. We further demonstrate that FastSpar reduces network inference wall time by 2-3 orders of magnitude compared to SparCC. FastSpar source code, precompiled binaries, and platform packages are freely available on GitHub: cwatts/FastSpar
Publisher: Springer Science and Business Media LLC
Date: 07-12-2008
DOI: 10.1038/NG.290
Publisher: Cold Spring Harbor Laboratory
Date: 17-12-2016
DOI: 10.1101/094714
Abstract: Principal component analysis (PCA) is a crucial step in quality control of genomic data and a common approach for understanding population genetic structure. With the advent of large genotyping studies involving hundreds of thousands of in iduals, standard approaches are no longer computationally feasible. We present FlashPCA2, a tool that can perform PCA on 1 million in iduals faster than competing approaches, while requiring substantially less memory. abraham/ashpca gad.abraham@unimelb.edu.au
Publisher: Springer Science and Business Media LLC
Date: 30-09-2021
DOI: 10.1186/S13058-021-01465-0
Abstract: Advancements in cancer therapeutics have resulted in increases in cancer-related survival however, there is a growing clinical dilemma. The current balancing of survival benefits and future cardiotoxic harms of oncotherapies has resulted in an increased burden of cardiovascular disease in breast cancer survivors. Risk stratification may help address this clinical dilemma. This study is the first to assess the association between a coronary artery disease-specific polygenic risk score and incident coronary artery events in female breast cancer survivors. We utilized the Studies in Epidemiology and Research in Cancer Heredity prospective cohort involving 12,413 women with breast cancer with genotype information and without a baseline history of cardiovascular disease. Cause-specific hazard ratios for association of the polygenic risk score and incident coronary artery disease (CAD) were obtained using left-truncated Cox regression adjusting for age, genotype array, conventional risk factors such as smoking and body mass index, as well as other sociodemographic, lifestyle, and medical variables. Over a median follow-up of 10.3 years (IQR: 16.8) years, 750 incident fatal or non-fatal coronary artery events were recorded. A 1 standard deviation higher polygenic risk score was associated with an adjusted hazard ratio of 1.33 (95% CI 1.20, 1.47) for incident CAD. This study provides evidence that a coronary artery disease-specific polygenic risk score can risk-stratify breast cancer survivors independently of other established cardiovascular risk factors.
Publisher: Cold Spring Harbor Laboratory
Date: 03-02-2020
DOI: 10.1101/2020.02.02.20020065
Abstract: Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including 563,946 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering the full allele frequency spectrum of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood cell traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell GWAS to interrogate clinically meaningful variants across the full allelic spectrum of human variation.
Publisher: Public Library of Science (PLoS)
Date: 23-10-2019
Publisher: Springer Science and Business Media LLC
Date: 16-07-2015
Publisher: Informa UK Limited
Date: 21-06-2020
Publisher: Springer Science and Business Media LLC
Date: 02-2022
DOI: 10.1038/S41588-021-00991-Z
Abstract: Human genetic variation affects the gut microbiota through a complex combination of environmental and host factors. Here we characterize genetic variations associated with microbial abundances in a single large-scale population-based cohort of 5,959 genotyped in iduals with matched gut microbial metagenomes, and dietary and health records (prevalent and follow-up). We identified 567 independent SNP-taxon associations. Variants at the LCT locus associated with Bifidobacterium and other taxa, but they differed according to dairy intake. Furthermore, levels of Faecalicatena lactaris associated with ABO, and suggested preferential utilization of secreted blood antigens as energy source in the gut. Enterococcus faecalis levels associated with variants in the MED13L locus, which has been linked to colorectal cancer. Mendelian randomization analysis indicated a potential causal effect of Morganella on major depressive disorder, consistent with observational incident disease analysis. Overall, we identify and characterize the intricate nature of host-microbiota interactions and their association with disease.
Publisher: Cold Spring Harbor Laboratory
Date: 08-2020
DOI: 10.1101/2020.07.30.20164962
Abstract: Fatty liver disease is the most common liver disease in the world. It is characterized by a buildup of excess fat in the liver that can lead to cirrhosis and liver failure. The link between fatty liver disease and gut microbiome has been known for at least 80 years. However, this association remains mostly unstudied in the general population because of underdiagnosis and small s le sizes. To address this knowledge gap, we studied the link between the Fatty Liver Index (FLI), a well-established proxy for fatty liver disease, and gut microbiome composition in a representative, ethnically homogeneous population s le in Finland. We based our models on biometric covariates and gut microbiome compositions from shallow metagenome sequencing. Our classification models could discriminate between in iduals with a high FLI (≥ 60, indicates likely liver steatosis) and low FLI ( 60) in our validation set, consisting of 30% of the data not used in model training, with an average AUC of 0.75. In addition to age and sex, our models included differences in 11 microbial groups from class Clostridia , mostly belonging to orders Lachnospirales and Oscillospirales . Pathway analysis of representative genomes of the FLI-associated taxa in (NCBI) Clostridium subclusters IV and XIVa indicated the presence of e.g ., ethanol fermentation pathways. Through modeling the fatty liver index, our results provide with high resolution associations between gut microbiota composition and fatty liver in a large representative population cohort and support the role of endogenous ethanol producers in the development of fatty liver.
Publisher: Springer Science and Business Media LLC
Date: 02-08-2023
DOI: 10.1038/S42003-023-05171-9
Abstract: RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood s les from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.
Publisher: Cold Spring Harbor Laboratory
Date: 25-08-2021
DOI: 10.1101/2021.08.20.21261814
Abstract: We integrated lipidomics and genomics to unravel the genetic architecture of lipid metabolism and identify genetic variants associated with lipid species that are putatively in the mechanistic pathway to coronary artery disease (CAD). We quantified 596 lipid species in serum from 4,492 phenotyped in iduals from the Busselton Health Study. In our discovery GWAS we identified 667 independent loci associations with these lipid species (479 novel), followed by meta-analysis and validation in two independent cohorts. Lipid endophenotypes (134) identified for CAD were associated with variation at 186 genomic loci. Associations between independent lipid-loci with coronary atherosclerosis were assessed in ∼456,000 in iduals from the UK Biobank. Of the 53 lipid-loci that showed evidence of association (P ×10 −3 ), 43 loci were associated with at least one of the 134 lipid endophenotypes. The findings of this study illustrate the value of integrative biology to investigate the genetics and lipid metabolism in the aetiology of atherosclerosis and CAD, with implications for other complex diseases.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 04-08-2020
Abstract: Several small‐scale animal studies have suggested that gut microbiota and blood pressure (BP) are linked. However, results from human studies remain scarce and conflicting. We wanted to elucidate the multivariable‐adjusted association between gut metagenome and BP in a large, representative, well‐phenotyped population s le. We performed a focused analysis to examine the previously reported inverse associations between sodium intake and Lactobacillus abundance and between Lactobacillus abundance and BP. We studied a population s le of 6953 Finns aged 25 to 74 years (mean age, 49.2±12.9 years 54.9% women). The participants underwent a health examination, which included BP measurement, stool collection, and 24‐hour urine s ling (N=829). Gut microbiota was analyzed using shallow shotgun metagenome sequencing. In age‐ and sex‐adjusted models, the α (within‐s le) and β (between‐s le) ersities of taxonomic composition were strongly related to BP indexes ( P .001 for most). In multivariable‐adjusted models, β ersity was only associated with diastolic BP ( P =0.032). However, we observed significant, mainly positive, associations between BP indexes and 45 microbial genera ( P .05), of which 27 belong to the phylum Firmicutes . Interestingly, we found mostly negative associations between 19 distinct Lactobacillus species and BP indexes ( P .05). Of these, greater abundance of the known probiotic Lactobacillus paracasei was associated with lower mean arterial pressure and lower dietary sodium intake ( P .001 for both). Although the associations between overall gut taxonomic composition and BP are weak, in iduals with hypertension demonstrate changes in several genera. We demonstrate strong negative associations of certain Lactobacillus species with sodium intake and BP, highlighting the need for experimental studies.
Publisher: Cold Spring Harbor Laboratory
Date: 17-04-2022
DOI: 10.1101/2022.04.17.488593
Abstract: Genetically predicted levels of multi-omic traits can uncover the molecular underpinnings of common phenotypes in a highly efficient manner. Here, we utilised a large cohort (INTERVAL N=50,000 participants) with extensive multi-omic data for plasma proteomics (SomaScan, N=3,175 Olink, N=4,822), plasma metabolomics (Metabolon HD4, N=8,153), serum metabolomics (Nightingale, N=37,359), and whole blood Illumina RNA sequencing (N=4,136). We used machine learning to train genetic scores for 17,227 molecular traits, including 10,521 which reached Bonferroni-adjusted significance. We evaluated genetic score performances in external validation across European, Asian and African American ancestries, and assessed their longitudinal stability within erse in iduals. We demonstrated the utility of these multi-omic genetic scores by quantifying the genetic control of biological pathways and by generating a synthetic multi-omic dataset of UK Biobank to identify disease associations using a phenome-wide scan. Finally, we developed a portal ( OmicsPred.org ) to facilitate public access to all genetic scores and validation results as well as to serve as a platform for future extensions and enhancements of multi-omic genetic scores.
Publisher: Springer Science and Business Media LLC
Date: 10-03-2021
Publisher: Cold Spring Harbor Laboratory
Date: 23-07-2019
DOI: 10.1101/712166
Abstract: Assessing the taxonomic composition of metagenomic s les is an important first step in understanding the biology and ecology of microbial communities in complex environments. Despite a wealth of algorithms and tools for metagenomic classification, relatively little effort has been put into the critical task of improving the quality of reference indices to which metagenomic reads are assigned. Here, we inferred the taxonomic composition of 404 publicly available metagenomes from human, marine and soil environments, using custom index databases modified according to two factors: the number of reference genomes used to build the databases, and the monophyletic strictness of species definitions. Index databases built following the NCBI taxonomic system were also compared to others using Genome Taxonomy Database (GTDB) taxonomic redefinitions. We observed a considerable increase in the rate of read classification using modified reference index databases as compared to a default NCBI RefSeq database, with up to a 4.4-, 6.4- and 2.2-fold increase in classified reads per s le for human, marine and soil metagenomes, respectively. Importantly, targeted correction for 70 common human pathogens and bacterial genera in the index database increased their specific detection levels in human metagenomes. We also show the choice of index database can influence downstream ersity and distance estimates for microbiome data. Overall, the study shows a large amount of accessible information in metagenomes remains unexploited using current methods, and that the same data analysed using different index databases could potentially lead to different conclusions. These results have implications for the power and design of in idual microbiome studies, and for comparison and meta-analysis of microbiome datasets.
Publisher: Elsevier BV
Date: 06-2021
Publisher: Public Library of Science (PLoS)
Date: 22-06-2017
Publisher: Cold Spring Harbor Laboratory
Date: 02-07-2019
DOI: 10.1101/689935
Abstract: Recent genome-wide association studies in stroke have enabled the generation of genomic risk scores (GRS) but the predictive power of these GRS has been modest in comparison to established stroke risk factors. Here, using a meta-scoring approach, we developed a metaGRS for ischaemic stroke (IS) and analysed this score in the UK Biobank (n=395,393 3075 IS events by age 75). The metaGRS hazard ratio for IS (1.26, 95% CI 1.22-1.31 per standard deviation increase of the score) doubled that of previous GRS, enabling the identification of a subset of in iduals at monogenic levels of risk: in iduals in the top 0.25% of metaGRS had a three-fold increased risk of IS. The metaGRS was similarly or more predictive when compared to established risk factors, such as family history, blood pressure, body mass index and smoking status. For participants within accepted guideline levels for established stroke risk factors, we found substantial variation in incident stroke rates across genomic risk backgrounds. We further estimated combinations of reductions needed in modifiable risk factors for in iduals with different levels of genomic risk and suggest that, for in iduals with high metaGRS, achieving currently recommended risk factor levels may be insufficient to mitigate risk.
Publisher: Elsevier BV
Date: 09-2020
Publisher: Microbiology Society
Date: 11-07-2016
Publisher: BMJ
Date: 04-09-2020
DOI: 10.1136/ANNRHEUMDIS-2020-217421
Abstract: Juvenile idiopathic arthritis (JIA) is an autoimmune disease and a common cause of chronic disability in children. Diagnosis of JIA is based purely on clinical symptoms, which can be variable, leading to diagnosis and treatment delays. Despite JIA having substantial heritability, the construction of genomic risk scores (GRSs) to aid or expedite diagnosis has not been assessed. Here, we generate GRSs for JIA and its subtypes and evaluate their performance. We examined three case/control cohorts (UK, US-based and Australia) with genome-wide single nucleotide polymorphism (SNP) genotypes. We trained GRSs for JIA and its subtypes using lasso-penalised linear models in cross-validation on the UK cohort, and externally tested it in the other cohorts. The JIA GRS alone achieved cross-validated area under the receiver operating characteristic curve (AUC)=0.670 in the UK cohort and externally-validated AUCs of 0.657 and 0.671 in the US-based and Australian cohorts, respectively. In logistic regression of case/control status, the corresponding odds ratios (ORs) per standard deviation (SD) of GRS were 1.831 (1.685 to 1.991) and 2.008 (1.731 to 2.345), and were unattenuated by adjustment for sex or the top 10 genetic principal components. Extending our analysis to JIA subtypes revealed that the enthesitis-related JIA had both the longest time-to-referral and the subtype GRS with the strongest predictive capacity overall across data sets: AUCs 0.82 in UK 0.84 in Australian and 0.70 in US-based. The particularly common oligoarthritis JIA also had a GRS that outperformed those for JIA overall, with AUCs of 0.72, 0.74 and 0.77, respectively. A GRS for JIA has potential to augment clinical JIA diagnosis protocols, prioritising higher-risk in iduals for follow-up and treatment. Consistent with JIA heterogeneity, subtype-specific GRSs showed particularly high performance for enthesitis-related and oligoarthritis JIA.
Publisher: Microbiology Society
Date: 08-2017
Publisher: Elsevier BV
Date: 12-2019
Publisher: Rockefeller University Press
Date: 06-10-2014
DOI: 10.1084/JEM.20140425
Abstract: Activated B cells undergo immunoglobulin class-switch recombination (CSR) and differentiate into antibody-secreting plasma cells. The distinct transcriptomes of B cells and plasma cells are maintained by the antagonistic influences of two groups of transcription factors: those that maintain the B cell program, including BCL6 and PAX5, and plasma cell–promoting factors, such as IRF4 and BLIMP-1. We show that the complex of IRF8 and PU.1 controls the propensity of B cells to undergo CSR and plasma cell differentiation by concurrently promoting the expression of BCL6 and PAX5 and repressing AID and BLIMP-1. As the PU.1–IRF8 complex functions in a reciprocal manner to IRF4, we propose that concentration-dependent competition between these factors controls B cell terminal differentiation.
Publisher: Springer Science and Business Media LLC
Date: 29-03-2023
Publisher: Oxford University Press (OUP)
Date: 10-09-2007
DOI: 10.1093/BIOINFORMATICS/BTM443
Abstract: Motivation: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. Results: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of in iduals simultaneously and pools information across multiple in iduals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. Availability: The C++ executable for the algorithm described here is available by request from the authors. Contact: teo@well.ox.ac.uk or tgc@well.ox.ac.uk
Publisher: Elsevier BV
Date: 12-2011
DOI: 10.1016/J.TIG.2011.09.002
Abstract: Following the widespread use of genome-wide association studies to elucidate the genetic architectures of complex phenotypes, there has been a push to augment existing observational studies with additional layers of molecular information. The resulting high-dimensional data have led the emergence of research in integrative systems biology. Here, we examine recent progress in characterizing biological networks as well as the corresponding conceptual and analytical challenges. Using ex les from metabolomics, we contend that integrative systems biology should prompt a re-examination of conventional phenotypic measures where heterogeneous or correlated phenotypes can be fine-mapped. Although still in its infancy, it is apparent that the large-scale characterization of molecular systems will transform our understanding of phenotype, biology and pathogenesis.
Publisher: Springer Science and Business Media LLC
Date: 04-2008
Location: Australia
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Location: United States of America
Location: United Kingdom of Great Britain and Northern Ireland
Start Date: 2023
End Date: 12-2025
Amount: $433,078.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2010
End Date: 12-2014
Amount: $240,546.00
Funder: Australian Research Council
View Funded Activity