ORCID Profile
0000-0002-4300-455X
Current Organisation
University of Melbourne
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Genomics | Genetics | Bioinformatics | Microbial Ecology | Biostatistics | Molecular Evolution | Stochastic Analysis and Modelling | Biochemistry and Cell Biology | Statistics | Environmental Technologies | Environmental Engineering | Public Health and Health Services not elsewhere classified | Cell Development, Proliferation and Death | Systems Biology | Population, Ecological and Evolutionary Genetics
Expanding Knowledge in the Biological Sciences | Expanding Knowledge in the Mathematical Sciences | Flora, Fauna and Biodiversity of environments not elsewhere classified | Urban and Industrial Water Management | Urban Water Evaluation (incl. Water Quality) | Information Processing Services (incl. Data Entry and Capture) | Public Health (excl. Specific Population Health) not elsewhere classified | Disease Distribution and Transmission (incl. Surveillance and Response) |
Publisher: IEEE
Date: 11-2011
Publisher: Cold Spring Harbor Laboratory
Date: 29-11-2018
DOI: 10.1101/482661
Abstract: Klebsiella pneumoniae frequently harbour multidrug resistance and current methodologies are struggling to rapidly discern feasible antibiotics to treat these infections. While rapid DNA sequencing has been proposed for prediction of resistance profile the role of rapid RNA sequencing has yet to be fully explored. The MinION sequencer can sequence native DNA and RNA in real-time, providing an opportunity to contrast the utility of DNA and RNA for prediction of drug susceptibility. This study interrogated the genome and transcriptome of four extensively drug-resistant (XDR) K. pneumoniae clinical isolates. The majority of acquired resistance (≥75%) resided on plasmids including several megaplasmids (≥100 kbp). DNA sequencing identified most resistance genes (≥70%) within 2 hours of sequencing. Direct RNA sequencing (with a ∼6x slower pore translocation) was able to identify ≥35% of resistance genes, including aminoglycoside, β-lactam, trimethoprim and sulphonamide and also quinolone, rif icin, fosfomycin and phenicol in some isolates, within 10 hours of sequencing. Polymyxin-resistant isolates showed a heightened transcription of phoPQ ( ≥2-fold) and the pmrHFIJKLM operon (≥8-fold). Expression levels estimated from direct RNA sequencing displayed strong correlation (Pearson: 0.86) compared to qRT-PCR across 11 resistance genes. Overall, MinION sequencing rapidly detected the XDR K. pneumoniae resistome and direct RNA sequencing revealed differential expression of these genes.
Publisher: Microbiology Society
Date: 03-2018
Publisher: Springer Science and Business Media LLC
Date: 31-10-2012
DOI: 10.1038/NATURE11632
Publisher: Public Library of Science (PLoS)
Date: 12-03-2013
Publisher: IEEE
Date: 07-2010
DOI: 10.1109/SCC.2010.42
Publisher: Springer Science and Business Media LLC
Date: 11-02-2011
Publisher: Oxford University Press (OUP)
Date: 26-07-2016
Publisher: Oxford University Press (OUP)
Date: 20-04-2010
DOI: 10.1093/BIOINFORMATICS/BTQ157
Abstract: Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of in idual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A s ling algorithm is employed to obtain a measure of confidence/credibility of each estimate. Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: l.coin@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Public Library of Science (PLoS)
Date: 29-08-2008
Publisher: Cold Spring Harbor Laboratory
Date: 04-05-2017
DOI: 10.1101/134155
Abstract: The recently introduced barcoding protocol to Oxford Nanopore sequencing has increased the versatility of the technology. Several bioinformatic tools have been developed to demultiplex the barcoded reads, but none of them support the streaming analysis. This limits the use of pooled sequencing in real-time applications, which is one of the main advantages of the technology. We introduced npBarcode, an open source and cross platform tool for barcode demultiplex in streaming fashion. npBarcode can be seamlessly integrated into a streaming analysis pipeline. The tool also provides a friendly graphical user interface through npReader, allowing the real-time visual monitoring of the sequencing progress of barcoded s les. We show that npBarcode achieves comparable accuracies to the other alternatives. npBarcode is bundled in Japsa - a Java tools kit for genome analysis, and is freely available at snguyen/npBarcode .
Publisher: Frontiers Media SA
Date: 02-03-2021
DOI: 10.3389/FIMMU.2021.637164
Abstract: Recently, host whole blood gene expression signatures have been identified for diagnosis of tuberculosis (TB). Absolute quantification of the concentrations of signature transcripts in blood have not been reported, but would facilitate diagnostic test development. To identify minimal transcript signatures, we applied a transcript selection procedure to microarray data from African adults comprising 536 patients with TB, other diseases (OD) and latent TB (LTBI), ided into training and test sets. Signatures were further investigated using reverse transcriptase (RT)—digital PCR (dPCR). A four-transcript signature ( GBP6, TMCC1, PRDM1 , and ARG1 ) measured using RT-dPCR distinguished TB patients from those with OD (area under the curve (AUC) 93.8% (CI 95% 82.2–100%). A three-transcript signature ( FCGR1A, ZNF296, and C1QB ) differentiated TB from LTBI (AUC 97.3%, CI 95% : 93.3–100%), regardless of HIV. These signatures have been validated across platforms and across s les offering strong, quantitative support for their use as diagnostic biomarkers for TB.
Publisher: Springer Science and Business Media LLC
Date: 2012
Publisher: Elsevier BV
Date: 03-2022
DOI: 10.1016/J.JHAZMAT.2021.127906
Abstract: Improving the recovery of organic matter and phosphorus (P) from hazardous biowastes such as swine manure using acidic substrates (ASs) in conjunction with aerobic composting is of great interest. This work aimed to investigate the effects of ASs on the humification and/or P migration as well as on microbial succession during the swine manure composting, employing multivariate and multiscale approaches. Adding ASs, derived from wood vinegar and humic acid, increased the degree of humification and thermal stability of the compost. The
Publisher: Cold Spring Harbor Laboratory
Date: 07-03-2019
DOI: 10.1101/570614
Abstract: A variety of experimental and computational methods have been developed to demultiplex s les from pooled in iduals in a single-cell RNA sequencing (scRNA-Seq) experiment which either require adding information (such as hashtag barcodes) or measuring information (such as genotypes) prior to pooling. We introduce scSplit which utilises genetic differences inferred from scRNA-Seq data alone to demultiplex pooled s les. scSplit also extracts a minimal set of high confidence presence/absence genotypes in each cluster which can be used to map clusters to original s les. Using a range of simulated, merged in idual-s le as well as pooled multi-in idual scRNA-Seq datasets, we show that scSplit is highly accurate and concordant with demuxlet predictions. Furthermore, scSplit predictions are highly consistent with the known truth in cell-hashing dataset. We also show that multiplexed-scRNA-Seq can be used to reduce batch effects caused by technical biases. scSplit is ideally suited to s les for which external genome-wide genotype data cannot be obtained (for ex le non-model organisms), or for which it is impossible to obtain unmixed s les directly, such as mixtures of genetically distinct tumour cells, or mixed infections. scSplit is available at: on-xu/scSplit
Publisher: Cold Spring Harbor Laboratory
Date: 22-12-2020
DOI: 10.1101/2020.12.22.423893
Abstract: SARS-CoV-2 uses subgenomic (sg)RNA to produce viral proteins for replication and immune evasion. We applied long-read RNA and cDNA sequencing to in vitro human and primate infection models to study transcriptional dynamics. Transcription-regulating sequence (TRS)-dependent sgRNA was upregulated earlier in infection than TRS-independent sgRNA. An abundant class of TRS-independent sgRNA consisting of a portion of ORF1ab containing nsp1 joined to ORF10 and 3’UTR was upregulated at 48 hours post infection in human cell lines. We identified double-junction sgRNA containing both TRS-dependent and independent junctions. We found multiple sites at which the SARS-CoV-2 genome is consistently more modified than sgRNA, and that sgRNA modifications are stable across transcript clusters, host cells and time since infection. Our work highlights the dynamic nature of the SARS-CoV-2 transcriptome during its replication cycle. Our results are available via an interactive web-app at coinlab.mdhs.unimelb.edu.au/ .
Publisher: MDPI AG
Date: 12-12-2021
DOI: 10.3390/W13243551
Abstract: With increasing concerns about public health and the development of molecular techniques, new detection tools and the combination of existing approaches have increased the abilities of pathogenic bacteria monitoring by exploring new biomarkers, increasing the sensitivity and accuracy of detection, quantification, and analyzing various genes such as functional genes and antimicrobial resistance genes (ARG). Molecular methods are gradually emerging as the most popular detection approach for pathogens, in addition to the conventional culture-based plate enumeration methods. The analysis of pathogens in wastewater and the back-estimation of infections in the community, also known as wastewater-based epidemiology (WBE), is an emerging methodology and has a great potential to supplement current surveillance systems for the monitoring of infectious diseases and the early warning of outbreaks. However, as a complex matrix, wastewater largely challenges the analytical performance of molecular methods. This review synthesized the literature of typical pathogenic bacteria in wastewater, types of biomarkers, molecular methods for bacterial analysis, and their recent advances in wastewater analysis. The advantages and limitation of these molecular methods were evaluated, and their prospects in WBE were discussed to provide insight for future development.
Publisher: Oxford University Press (OUP)
Date: 26-01-2010
DOI: 10.1093/BIOINFORMATICS/BTQ029
Abstract: Summary: Inversions are a common form of structural variation, which may have a marked effect on the genome and methods to infer quantities of interest such as those relating to population structure and natural selection. However, due to the challenge in detecting inversions, little is presently known about their impact. Software to simulate inversions could be used to provide a better understanding of how to detect and account for them but while there are several software packages for simulating population genetic data, none incorporate inversion polymorphisms. Here, we describe a software package, modified from the forward-in-time simulator FREGENE, which simulates the evolution of an inversion polymorphism, of specified length, location, frequency and age, in a population of sequences. We describe previously unreported signatures of inversions in SNP data observed in invertFREGENE results and a known inversion in humans. Availability: C++ source code and user manual are available for download from www.ebi.ac.uk rojects/BARGEN/ under the GPL licence. Contact: l.coin@ic.ac.uk c.hoggart@ic.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 24-02-2023
DOI: 10.1038/S41467-023-36717-4
Abstract: A new variant of Streptococcus pyogenes serotype M1 (designated ‘M1 UK ’) has been reported in the United Kingdom, linked with seasonal scarlet fever surges, marked increase in invasive infections, and exhibiting enhanced expression of the superantigen SpeA. The progenitor S. pyogenes ‘M1 global ’ and M1 UK clones can be differentiated by 27 SNPs and 4 indels, yet the mechanism for speA upregulation is unknown. Here we investigate the previously unappreciated expansion of M1 UK in Australia, now isolated from the majority of serious infections caused by serotype M1 S. pyogenes . M1 UK sub-lineages circulating in Australia also contain a novel toxin repertoire associated with epidemic scarlet fever causing S. pyogenes in Asia. A single SNP in the 5’ transcriptional leader sequence of the transfer-messenger RNA gene ssrA drives enhanced SpeA superantigen expression as a result of ssrA terminator read-through in the M1 UK lineage. This represents a previously unappreciated mechanism of toxin expression and urges enhanced international surveillance.
Publisher: Springer International Publishing
Date: 2016
Publisher: Springer International Publishing
Date: 2016
Publisher: IEEE
Date: 10-2017
DOI: 10.1109/LCN.2017.63
Publisher: Oxford University Press (OUP)
Date: 12-2020
DOI: 10.1093/GIGASCIENCE/GIAA146
Abstract: Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same s le. Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same s le. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.
Publisher: Elsevier BV
Date: 09-2023
Publisher: Cold Spring Harbor Laboratory
Date: 28-03-2018
DOI: 10.1101/290171
Abstract: The accurate detection of copy number alterations from the analysis of circulating cell free tumour DNA (ctDNA) in blood is essential to realising the potential of liquid biopsies. However, currently available approaches require a large number of plasma s les from healthy in iduals, sequenced using the same platform and protocols to act as a reference panel. Obtaining this reference panel can be challenging, prohibitively expensive and limits the ability to migrate to improved sequencing platforms and improved protocols. We developed qCNV and sCNA-seq, two distinct tools that together provide a new approach for profiling somatic copy number alterations (sCNA) through the analysis of cell free DNA (cfDNA) without a reference panel. Our approach was designed to identify sCNA from cfDNA through the analysis of a single plasma s le and a matched normal DNA s le -both of which can be obtained from the same blood draw. qCNV is an efficient method for extracting read-depth from BAM files and sCNA-seq is a method that uses a probabilistic model of read depth to infer the copy number segmentation of the tumour. We compared the results from our pipeline to the established copy number profile of a cell-line, as well as the results from the plasma-Seq analysis of cfDNA-like mixtures and real, clinical data-sets. With a single, unmatched, germline reference s le, our pipeline recapitulated the known copy number profile of a cell-line and demonstrated similar results to those obtained from plasma-Seq. With less than 1X genome coverage, our approach identified clinically relevant sCNA in s les with as little as 20 % tumour DNA. When applied to plasma s les from cancer patients, our pipeline identified clinically significant mutations. These results show it is possible to identify therapeutically-relevant copy number mutations from plasma s les without the need to generate a reference panel from a large number of healthy in iduals. Together with the range of sequencing platforms supported by our qCNV+sCNA-Seq pipeline, as well as the Galaxy implementation of this solution, this pipeline makes cfDNA profiling more accessible and makes it easier to identify sCNA from the plasma of cancer patients.
Publisher: Public Library of Science (PLoS)
Date: 15-11-2017
Publisher: Springer Science and Business Media LLC
Date: 06-05-2019
DOI: 10.1038/S41598-019-43292-6
Abstract: Non-coding genetic variants play an important role in driving susceptibility to complex diseases but their characterization remains challenging. Here, we employed a novel approach to interrogate the genetic risk of such polymorphisms in a more systematic way by targeting specific regulatory regions relevant for the phenotype studied. We applied this method to meningococcal disease susceptibility, using the DNA binding pattern of RELA – a NF-kB subunit, master regulator of the response to infection – under bacterial stimuli in nasopharyngeal epithelial cells. We designed a custom panel to cover these RELA binding sites and used it for targeted sequencing in cases and controls. Variant calling and association analysis were performed followed by validation of candidate polymorphisms by genotyping in three independent cohorts. We identified two new polymorphisms, rs4823231 and rs11913168, showing signs of association with meningococcal disease susceptibility. In addition, using our genomic data as well as publicly available resources, we found evidences for these SNPs to have potential regulatory effects on ATXN10 and LIF genes respectively. The variants and related candidate genes are relevant for infectious diseases and may have important contribution for meningococcal disease pathology. Finally, we described a novel genetic association approach that could be applied to other phenotypes.
Publisher: Cold Spring Harbor Laboratory
Date: 23-08-2017
DOI: 10.1101/179531
Abstract: Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling: directly translating the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4000 reads, we show that our model provides state-of-the-art basecalling accuracy even on previously unseen species. Chiron achieves basecalling speeds of over 2000 bases per second using desktop computer graphics processing units.
Publisher: Cold Spring Harbor Laboratory
Date: 28-04-2018
DOI: 10.1101/309674
Abstract: Polymyxin B and E (colistin) have been pivotal in the treatment of extensively drug-resistant (XDR) Gram-negative bacterial infections, with increasing use over the past decade. Unfortunately, resistance to these antibiotics is rapidly emerging. The structurally-related octapeptin C4 (OctC4) has shown significant potency against XDR bacteria, including against polymyxin-resistant (Pmx-R) strains, but its mode of action remains undefined. We sought to compare and contrast the acquisition of XDR Klebsiella pneumoniae (ST258) resistance in vitro with all three lipopeptides to help elucidate the mode of action of the drugs and potential mechanisms of resistance evolution. Strikingly, 20 days of exposure to the polymyxins resulted in a dramatic (1000-fold) increase in the minimum inhibitory concentration (MIC) for the polymyxins, reflecting the evolution of resistance seen in clinical isolates, whereas for OctC4 only a 4-fold increase was witnessed. There was no cross-resistance observed between the polymyxin - and octapeptin-induced resistant strains. Sequencing revealed previously known gene alterations for polymyxin resistance, including crrB , mgrB , pmrB , phoPQ and yciM , and novel mutations in qseC . In contrast, mutations in mlaDF and pqiB , 1genes related to phospholipid transport, were found in octapeptin-resistant isolates. Mutation effects were validated via complementation assays. These genetic variations were reflected in phenotypic changes to lipid A. Pmx-R isolates increased 4-amino-4-deoxy-arabinose fortification to phosphate groups of lipid A, whereas OctC4 induced strains harbored a higher abundance of hydroxymyristate and palmitoylate. The results reveal a differing mode of action compared to polymyxins which provides hope for future therapeutics to combat the increasingly threat of XDR bacteria.
Publisher: Springer Science and Business Media LLC
Date: 31-08-2011
DOI: 10.1038/NATURE10406
Publisher: The Open Journal
Date: 08-01-2022
DOI: 10.21105/JOSS.02991
Publisher: Oxford University Press (OUP)
Date: 10-04-2018
Publisher: Public Library of Science (PLoS)
Date: 02-05-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2020
Publisher: Oxford University Press (OUP)
Date: 05-05-2011
DOI: 10.1093/BIOINFORMATICS/BTR264
Abstract: Summary: A program package to enable genome-wide association of copy number variants (CNVs) with quantitative phenotypes in families of arbitrary size and complexity. Intensity signals that act as proxies for the number of copies are modeled in a variance component framework and association with traits is assessed through formal likelihood testing. Availability and implementation: The Java package is made available at www.imperial.ac.uk/medicine eople/m.falchi/. Contact: m.falchi@imperial.ac.uk
Publisher: Springer International Publishing
Date: 2016
Publisher: Elsevier BV
Date: 2022
Publisher: Cold Spring Harbor Laboratory
Date: 19-11-2020
DOI: 10.1101/2020.11.17.386839
Abstract: Here, with the ex le of common copy number variation (CNV) in the TSPAN8 gene, we present an important piece of work in the field of CNV detection, CNV association with complex human traits such as 1 H NMR metabolomic phenotypes and an ex le of functional characterization of CNVs among human induced pluripotent stem cells (HipSci). We report TSPAN8 exon 11 as a new locus associated with metabolomic regulation and show that its biology is associated with several metabolic diseases such as type 2 diabetes (T2D), obesity and cancer. Our results further demonstrate the power of multivariate association models over univariate methods and define new metabolomic signatures for several new genomic loci, which can act as a catalyst for new diagnostics and therapeutic approaches.
Publisher: Oxford University Press (OUP)
Date: 05-07-2021
DOI: 10.1093/BIB/BBAB245
Abstract: Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.
Publisher: Springer Science and Business Media LLC
Date: 24-07-2019
Publisher: Springer International Publishing
Date: 2014
Publisher: Microbiology Society
Date: 02-2018
Publisher: Springer Science and Business Media LLC
Date: 03-2004
DOI: 10.1038/NRC1299
Publisher: Springer Science and Business Media LLC
Date: 31-08-2014
DOI: 10.1038/NN.3801
Publisher: Wiley
Date: 14-03-2011
DOI: 10.1111/J.1469-1809.2011.00641.X
Abstract: We undertook a candidate locus study of the HIN200 gene cluster on 1q21-23 in UK systemic lupus erythematosus (SLE) families. To date, despite mounting evidence demonstrating the importance of these proteins in autoimmune disease, cancer, apoptosis, inflammation, and cell cycle arrest, there has been a dearth of data with respect to the genetic characterisation of the HIN200 locus in SLE or any other disease. We typed 83 single nucleotide polymorphisms (SNPs) across 317 kb of the HIN200 cluster in 428 UK SLE families and sought replication from a European-American lupus cohort. We do not find strong evidence of SNP association in either cohort. Interestingly, we do observe a trend for association with certain HIN200 SNPs and serologic subphenotypes in UK SLE that parallels the association of lupus antibodies with the orthologous murine locus. Furthermore, we find the HIN200 locus to be unexpectedly complex in terms of genetic structural organisation. We have identified a number of copy number variants (CNVs) in this region in healthy French males, HapMap s les, and UK SLE families. In summary, candidate interferon signalling genes show evidence of common CNV in human SLE and healthy subjects. The impact of these CNVs in health and disease remains to be determined.
Publisher: F1000 Research Ltd
Date: 05-09-2018
DOI: 10.12688/GATESOPENRES.12856.1
Abstract: Background: The chloroplast (cp) genome is an important resource for studying plant ersity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage ( ×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida . Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.
Publisher: F1000 Research Ltd
Date: 21-07-2020
DOI: 10.12688/GATESOPENRES.12856.2
Abstract: Background: The chloroplast (cp) genome is an important resource for studying plant ersity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage ( ×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida . Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2023
Publisher: Springer Science and Business Media LLC
Date: 02-11-2018
DOI: 10.1038/S41467-018-06983-8
Abstract: Sweetpotato [ Ipomoea batatas (L.) Lam.] is a globally important staple food crop, especially for sub-Saharan Africa. Agronomic improvement of sweetpotato has lagged behind other major food crops due to a lack of genomic and genetic resources and inherent challenges in breeding a heterozygous, clonally propagated polyploid. Here, we report the genome sequences of its two diploid relatives, I. trifida and I. triloba , and show that these high-quality genome assemblies are robust references for hexaploid sweetpotato. Comparative and phylogenetic analyses reveal insights into the ancient whole-genome triplication history of Ipomoea and evolutionary relationships within the Batatas complex. Using resequencing data from 16 genotypes widely used in African breeding programs, genes and alleles associated with carotenoid biosynthesis in storage roots are identified, which may enable efficient breeding of varieties with high provitamin A content. These resources will facilitate genome-enabled breeding in this important food security crop.
Publisher: Cold Spring Harbor Laboratory
Date: 07-03-2020
DOI: 10.1101/2020.03.05.976167
Abstract: Fundamental aspects of SARS-CoV-2 biology remain to be described, having the potential to provide insight to the response effort for this high-priority pathogen. Here we describe the first native RNA sequence of SARS-CoV-2, detailing the coronaviral transcriptome and epitranscriptome, and share these data publicly. A data-driven inference of viral genetic features and evolutionary rate is also made. The rapid sharing of sequence information throughout the SARS-CoV-2 pandemic represents an inflection point for public health and genomic epidemiology, providing early insights into the biology and evolution of this emerging pathogen.
Publisher: Springer International Publishing
Date: 2016
Publisher: Oxford University Press (OUP)
Date: 07-07-2022
DOI: 10.1093/BIOINFORMATICS/BTAC454
Abstract: Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. The webserver and source code are freely available at bigdata.biocie.cn/deepgengrep/home and Github (x-cie/DeepGenGrep/). Supplementary data are available at Bioinformatics online.
Publisher: Frontiers Media SA
Date: 23-06-2021
DOI: 10.3389/FCIMB.2021.667680
Abstract: Sepsis contributes significantly to morbidity and mortality globally. In Australia, 20,000 develop sepsis every year, resulting in 5,000 deaths, and more than AUD$846 million in expenditure. Prompt, appropriate antibiotic therapy is effective in improving outcomes in sepsis. Conventional culture-based methods to identify appropriate therapy have limited yield and take days to complete. Recently, nanopore technology has enabled rapid sequencing with real-time analysis of pathogen DNA. We set out to demonstrate the feasibility and diagnostic accuracy of pathogen sequencing direct from clinical s les, and estimate the impact of this approach on time to effective therapy when integrated with personalised software-guided antimicrobial dosing in children and adults on ICU with sepsis. The DIRECT study is a pilot prospective, non-randomized multicentre trial of an integrated diagnostic and therapeutic algorithm combining rapid direct pathogen sequencing and software-guided, personalised antibiotic dosing in children and adults with sepsis on ICU. DIRECT will collect microbiological and pharmacokinetic s les from approximately 200 children and adults with sepsis admitted to one of four ICUs in Brisbane. In Phase 1, we will evaluate Oxford Nanopore Technologies MinION sequencing direct from blood in 50 blood culture-proven sepsis patients recruited from consecutive patients with suspected sepsis. In Phase 2, a further 50 consecutive patients with suspected sepsis will be recruited in whom MinION sequencing will be combined with Bayesian software-guided (ID-ODS) personalised antimicrobial dosing. The primary outcome is time to effective antimicrobial therapy, defined as trough drug concentrations above the MIC of the pathogen. Secondary outcomes are diagnostic accuracy of MinION sequencing from whole blood, time to pathogen identification and susceptibility testing using sequencing direct from whole blood and from positive blood culture broth. Rapid pathogen sequencing coupled with antimicrobial dosing software has great potential to overcome the limitations of conventional diagnostics which often result in prolonged inappropriate antimicrobial therapy. Reduced time to optimal antimicrobial therapy may reduce sepsis mortality and ICU length of stay. This pilot study will yield key feasibility data to inform further, urgently needed sepsis studies. Phase 2 of the trial protocol is registered with the ANZCTR (ACTRN12620001122943). Registered with the Australia New Zealand Clinical Trials Registry Number ACTRN12620001122943
Publisher: Springer International Publishing
Date: 2016
Publisher: Springer Science and Business Media LLC
Date: 19-11-2013
Abstract: Y haplogroup analyses are an important component of genealogical reconstruction, population genetic analyses, medical genetics and forensics. These fields are increasingly moving towards use of low-coverage, high throughput sequencing. While there have been methods recently proposed for assignment of Y haplogroups on the basis of high-coverage sequence data, assignment on the basis of low-coverage data remains challenging. We developed a new algorithm, YHap, which uses an imputation framework to jointly predict Y chromosome genotypes and assign Y haplogroups using low coverage population sequence data. We use data from the 1000 genomes project to demonstrate that YHap provides accurate Y haplogroup assignment with less than 2x coverage. Borrowing information across multiple s les within a population using an imputation framework enables accurate Y haplogroup assignment.
Publisher: American Medical Association (AMA)
Date: 10-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2022
Publisher: Cold Spring Harbor Laboratory
Date: 30-01-2018
DOI: 10.1101/256719
Abstract: A better understanding of the genomic changes that facilitate the emergence and spread of drug resistant M. tuberculosis strains is required. Short-read sequencing methods have limited capacity to identify long, repetitive genomic regions and gene duplications. We sequenced an extensively drug resistant (XDR) Beijing sub-lineage 2.2.1.1 “epidemic strain” from the Western Province of Papua New Guinea using long-read sequencing (Oxford Nanopore MinION®). With up to 274 fold coverage from a single flow-cell, we assembled a 4404947bp circular genome containing 3670 coding sequences that include the highly repetitive PE/PPE genes. Comparison with Illumina reads indicated a base-level accuracy of 99.95%. Mutations known to confer drug resistance to first and second line drugs were identified and concurred with phenotypic resistance assays. We identified mutations in efflux pump genes (Rv0194), transporters ( secA1 , glnQ , uspA ), cell wall biosynthesis genes ( pdk , mmpL , fadD ) and virulence genes ( mce -gene family, mycp1 ) that may contribute to the drug resistance phenotype and successful transmission of this strain. Using the newly assembled genome as reference to map raw Illumina reads from representative M. tuberculosis lineages, we detect large insertions relative to the reference genome. We provide a fully annotated genome of a transmissible XDR M. tuberculosis strain from Papua New Guinea using Oxford Nanopore MinION sequencing and provide insight into genomic mechanisms of resistance and virulence. S le Illumina and MinION sequencing reads generated and analyzed are available in NCBI under project accession number PRJNA386696 ( ra/?term=PRJNA386696 ) The assembled complete genome and its annotations are available in NCBI under accession number CP022704.1 ( ra/?term=CP022704.1 ) We recently characterized a Modern Beijing lineage strain responsible for the drug resistance outbreaks in the Western province, Papua New Guinea. With some of the genomic markers responsible for its drug resistance and transmissibility are known, there is need to elucidate all molecular mechanisms that account for the resistance phenotype, virulence and transmission. Whole genome sequencing using short reads has widely been utilized to study MTB genome but it does not generally capture long repetitive regions as variants in these regions are eliminated using analysis. Illumina instruments are known to have a GC bias so that regions with high GC or AT rich are under s led and this effect is exacerbated in MTB, which has approximately 65% GC content. In this study, we utilized Oxford Nanopore Technologies (ONT) MinION sequencing to assemble a high-quality complete genome of an extensively drug resistant strain of a modern Beijing lineage. We were able to able to assemble all PE/PPE (proline-glutamate roline-proline-glutamate) gene families that have high GC content and repetitive in nature. We show the genomic utility of ONT in offering a more comprehensive understanding of genetic mechanisms that contribute to resistance, virulence and transmission. This is important for settings up predictive analytics platforms and services to support diagnostics and treatment.
Publisher: Elsevier BV
Date: 11-1995
Publisher: Springer Science and Business Media LLC
Date: 07-12-2008
DOI: 10.1038/NG.271
Publisher: Oxford University Press (OUP)
Date: 16-05-2012
DOI: 10.1093/HMG/DDS187
Publisher: Proceedings of the National Academy of Sciences
Date: 31-07-2023
Abstract: The ersity of COVID-19 disease in otherwise healthy people, from seemingly asymptomatic infection to severe life-threatening disease, is not clearly understood. We passaged a naturally occurring near-ancestral SARS-CoV-2 variant, capable of infecting wild-type mice, and identified viral genomic mutations coinciding with the acquisition of severe disease in young adult mice and lethality in aged animals. Transcriptomic analysis of lung tissues from mice with severe disease elucidated a host antiviral response dominated mainly by interferon and IL-6 pathway activation in young mice, while in aged animals, a fatal outcome was dominated by TNF and TGF-β signaling. Congruent with our pathway analysis, we showed that young TNF-deficient mice had mild disease compared to controls and aged TNF-deficient animals were more likely to survive infection. Emerging clinical correlates of disease are consistent with our preclinical studies, and our model may provide value in defining aberrant host responses that are causative of severe COVID-19.
Publisher: Elsevier BV
Date: 05-2011
Publisher: American Society for Microbiology
Date: 26-03-2020
DOI: 10.1128/MRA.00060-20
Abstract: Pandoraea fibrosis is a newly identified Gram-negative bacterial species that was isolated from the respiratory tract of an Australian cystic fibrosis patient. The complete assembled genome sequences of two consecutive isolates (second isolate collected 11 months after antibiotic treatment) from the same in idual are presented here.
Publisher: Cold Spring Harbor Laboratory
Date: 21-03-2019
DOI: 10.1101/583674
Abstract: Recently, host whole blood gene expression signatures have been identified for diagnosis of tuberculosis (TB). Absolute quantification of the concentrations of signature transcripts in blood have not been reported, but would facilitate the development of diagnostic tests. To identify minimal transcript signatures, we applied a novel transcript selection procedure to microarray data from African adults comprising 536 patients with TB, other diseases (OD) and latent TB (LTBI), ided into training and test sets. Signatures were validated using reverse transcriptase (RT) - digital PCR (dPCR). A four-transcript signature ( GBP6 , TMCC1 , PRDM1 , ARG1 ) measured using RT-dPCR distinguished TB patients from those with OD (area under the curve (AUC) 93.8% (CI 95% 82.2 – 100%). A three-transcript signature ( FCGR1A, ZNF296, C1QB ) differentiated TB from LTBI (AUC 97.3%, CI 95% : 93.3 – 100%), regardless of HIV. These signatures have been validated across platforms and across s les offering strong, quantitative support for their use as diagnostic biomarkers for TB.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2015
Publisher: Oxford University Press (OUP)
Date: 18-03-2020
DOI: 10.1093/CID/CIAA290
Abstract: The role of primary immunodeficiencies (PID) in susceptibility to sepsis remains unknown. It is unclear whether children with sepsis benefit from genetic investigations. We hypothesized that sepsis may represent the first manifestation of underlying PID. We applied whole-exome sequencing (WES) to a national cohort of children with sepsis to identify rare, predicted pathogenic variants in PID genes. We conducted a multicenter, population-based, prospective study including previously healthy children aged ≥28 days and & years admitted with blood culture-proven sepsis. Using a stringent variant filtering procedure, analysis of WES data was restricted to rare, predicted pathogenic variants in 240 PID genes for which increased susceptibility to bacterial infection has been reported. There were 176 children presenting with 185 sepsis episodes who underwent WES (median age, 52 months interquartile range, 15.4–126.4). There were 41 unique predicted pathogenic PID variants (1 homozygous, 5 hemizygous, and 35 heterozygous) found in 35/176 (20%) patients, including 3/176 (2%) patients carrying variants that were previously reported to lead to PID. The variants occurred in PID genes across all 8 PID categories, as defined by the International Union of Immunological Societies. We did not observe a significant correlation between clinical or laboratory characteristics of patients and the presence or absence of PID variants. Applying WES to a population-based cohort of previously healthy children with bacterial sepsis detected variants of uncertain significance in PID genes in 1 out of 5 children. Future studies need to investigate the functional relevance of these variants to determine whether variants in PID genes contribute to pediatric sepsis susceptibility.
Publisher: Proceedings of the National Academy of Sciences
Date: 06-04-2011
Abstract: Alcohol consumption is a moderately heritable trait, but the genetic basis in humans is largely unknown, despite its clinical and societal importance. We report a genome-wide association study meta-analysis of ∼2.5 million directly genotyped or imputed SNPs with alcohol consumption (gram per day per kilogram body weight) among 12 population-based s les of European ancestry, comprising 26,316 in iduals, with replication genotyping in an additional 21,185 in iduals. SNP rs6943555 in autism susceptibility candidate 2 gene ( AUTS2 ) was associated with alcohol consumption at genome-wide significance ( P = 4 × 10 −8 to P = 4 × 10 −9 ). We found a genotype-specific expression of AUTS2 in 96 human prefrontal cortex s les ( P = 0.026) and significant ( P 0.017) differences in expression of AUTS2 in whole-brain extracts of mice selected for differences in voluntary alcohol consumption. Down-regulation of an AUTS2 homolog caused reduced alcohol sensitivity in Drosophila ( P 0.001). Our finding of a regulator of alcohol consumption adds knowledge to our understanding of genetic mechanisms influencing alcohol drinking behavior.
Publisher: Cold Spring Harbor Laboratory
Date: 05-05-2017
DOI: 10.1101/134510
Abstract: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridisation process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridisation, and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimise the efficacy of the experiments. CapSim is publicly available under BSD license at dcao/capsim .
Publisher: Oxford University Press (OUP)
Date: 08-2011
DOI: 10.1373/CLINCHEM.2010.159558
Abstract: The accurate assignment of alleles embedded within trisomic or duplicated regions is an essential prerequisite for assessing the combined effects of single-nucleotide polymorphisms (SNPs) and genomic copy number. Such an integrated analysis is challenging because heterozygotes for such a SNP may be one of 2 genotypes—AAB or ABB. Established methods for SNP genotyping, however, can have difficulty discriminating between the 2 heterozygous trisomic genotypes. We developed a method for assigning heterozygous trisomic genotypes that uses the ratio of the height of the 2 allele peaks obtained by mass spectrometry after a single-base extension assay. Eighteen COL6A2 (collagen, type VI, alpha 2) SNPs were analyzed in euploid and trisomic in iduals by means of a multiplexed single-base extension assay that generated allele-specific oligonucleotides of differing Mr values for detection by MALDI-TOF mass spectrometry. Reference data (mean and SD) for the allele peak height ratios were determined from heterozygous euploid s les. The heterozygous trisomic genotypes were assigned by calculating the z score for each trisomic allele peak height ratio and by considering the sign (+/−) of the z score. Heterozygous trisomic genotypes were assigned in 96.1% (range, 89.9%–100%) of the s les for each SNP analyzed. The genotypes obtained were reproduced in 95 (97.5%) of 97 loci retested in a second assay. Subsequently, the origin of nondisjunction was determined in 108 (82%) of 132 family trios with a Down syndrome child. This approach enabled reliable genotyping of heterozygous trisomic s les and the determination of the origin of nondisjunction in Down syndrome family trios.
Publisher: Proceedings of the National Academy of Sciences
Date: 31-03-2003
Abstract: Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an ex le, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 02-2022
Publisher: Cold Spring Harbor Laboratory
Date: 18-03-2020
DOI: 10.1101/2020.03.16.992933
Abstract: Sequencing technologies have advanced to the point where it is possible to generate high accuracy, haplotype resolved, chromosome scale assemblies. Several long read sequencing technologies are available on the market and a growing number of algorithms have been developed over the last years to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology as well as the most appropriate software for assembly and polishing. For this reason, it is important to benchmark different approaches applied to the same s le. Here, we report a comparison of three long read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii . We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION) and BGI (single-tube Long Fragment Read) technologies for the same s le. Several assemblers were benchmarked in the assembly of PacBio and Nanopore reads. Results obtained from combining long read technologies or short read and long read technologies are also presented. The assemblies were compared for contiguity, accuracy and completeness as well as sequencing costs and DNA material requirements. Overall, the three long read technologies produced highly contiguous and complete genome assemblies of Macadamia jansenii . At the time of sequencing, the cost associated with each method was significantly different but continuous improvements in technologies have resulted in greater accuracy, increased throughput and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.
Publisher: Elsevier BV
Date: 12-2018
DOI: 10.1016/J.ENVINT.2018.10.040
Abstract: Antibiotic resistance poses an increasing threat to public health. Horizontal gene transfer (HGT) promoted by antibiotics is recognized as a significant pathway to disseminate antibiotic resistance genes (ARGs). However, it is unclear whether non-antibiotic, anti-microbial (NAAM) chemicals can directly promote HGT of ARGs in the environment. We aimed to investigate whether triclosan (TCS), a widely-used NAAM chemical in personal care products, is able to stimulate the conjugative transfer of antibiotic multi-resistance genes carried by plasmid within and across bacterial genera. We established two model mating systems, to investigate intra-genera transfer and inter-genera transfer. Escherichia coli K-12 LE392 carrying IncP-α plasmid RP4 was used as the donor, and E. coli K-12 MG1655 or Pseudomonas putida KT2440 were the intra- and inter-genera recipients, respectively. The mechanisms of the HGT promoted by TCS were unveiled by detecting oxidative stress and cell membrane permeability, in combination with Nanopore sequencing, genome-wide RNA sequencing and proteomic analyses. Exposure of the bacteria to environmentally relevant concentrations of TCS (from 0.02 μg/L to 20 μg/L) significantly stimulated the conjugative transfer of plasmid-encoded multi-resistance genes within and across genera. The TCS exposure promoted ROS generation and damaged bacterial membrane, and caused increased expression of the SOS response regulatory genes umuC, dinB and dinD in the donor. In addition, higher expression levels of ATP synthesis encoding genes in E. coli and P. putida were found with increased TCS dosage. TCS could enhance the conjugative ARGs transfer between bacteria by triggering ROS overproduction at environmentally relevant concentrations. These findings improve our awareness of the hidden risks of NAAM chemicals on the spread of antibiotic resistance.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 02-2011
DOI: 10.1161/CIRCGENETICS.110.940858
Abstract: Transforming growth factor (TGF)-β is a multifunctional peptide that is important in T-cell activation and cardiovascular remodeling, both of which are important features of Kawasaki disease (KD). We postulated that variation in TGF-β signaling might be important in KD susceptibility and disease outcome. We investigated genetic variation in 15 genes belonging to the TGF-β pathway in a total of 771 KD subjects of mainly European descent from the United States, the United Kingdom, Australia, and the Netherlands. We analyzed transcript abundance patterns using microarray and reverse transcriptase–polymerase chain reaction for these same genes, and measured TGF-β2 protein levels in plasma. Genetic variants in TGFB2 , TGFBR2 , and SMAD3 and their haplotypes were consistently and reproducibly associated with KD susceptibility, coronary artery aneurysm formation, aortic root dilatation, and intravenous immunoglobulin treatment response in different cohorts. A SMAD3 haplotype associated with KD susceptibility replicated in 2 independent cohorts and an intronic single nucleotide polymorphism in a separate haplotype block was also strongly associated (A/G, rs4776338) ( P =0.000022 odds ratio, 1.50 95% confidence interval, 1.25 to 1.81). Pathway analysis using all 15 genes further confirmed the importance of the TGF-β pathway in KD pathogenesis. Whole-blood transcript abundance for these genes and TGF-β2 plasma protein levels changed dynamically over the course of the illness. These studies suggest that genetic variation in the TGF-β pathway influences KD susceptibility, disease outcome, and response to therapy, and that aortic root and coronary artery Z scores can be used for phenotype/genotype analyses. Analysis of transcript abundance and protein levels further support the importance of this pathway in KD pathogenesis.
Publisher: Springer Science and Business Media LLC
Date: 12-2019
DOI: 10.1186/S13059-019-1852-7
Abstract: A variety of methods have been developed to demultiplex pooled s les in a single cell RNA sequencing (scRNA-seq) experiment which either require hashtag barcodes or s le genotypes prior to pooling. We introduce scSplit which utilizes genetic differences inferred from scRNA-seq data alone to demultiplex pooled s les. scSplit also enables mapping clusters to original s les. Using simulated, merged, and pooled multi-in idual datasets, we show that scSplit prediction is highly concordant with demuxlet predictions and is highly consistent with the known truth in cell-hashing dataset. scSplit is ideally suited to s les without external genotype information and is available at: on-xu/scSplit
Publisher: Cold Spring Harbor Laboratory
Date: 14-02-2017
DOI: 10.1101/108365
Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence (10%) and extensions(22%). These extensions are heritable and in some cases ergent between Asian and European s les. Six ninths of the extension sequence in NA12878 can be confirmed and filled by nanopore data. We identify two sequence families in these sequences which have undergone substantial duplication in multiple primate lineages. We show that these sequence families have arisen from progenitor interstitial sequence on the ancestral primate chromosome 7. Comparison of chromosome end sequences from 15 species revealed that chromosome end missing sequence matches the corresponding phylogenetic relationship and revealed a rate of chromosome extension per chromosome of 0.0020 bp per year in average.
Publisher: Oxford University Press (OUP)
Date: 14-11-2018
DOI: 10.1093/JAC/DKY458
Publisher: Proceedings of the National Academy of Sciences
Date: 13-05-2011
Publisher: Springer Science and Business Media LLC
Date: 13-07-2018
Publisher: National Institute for Health and Care Research
Date: 04-2021
DOI: 10.3310/EME08050
Abstract: Tuberculosis (TB) is a devastating disease for which new diagnostic tests are desperately needed. To validate promising new technologies [namely whole-blood transcriptomics, proteomics, flow cytometry and quantitative reverse transcription-polymerase chain reaction (qRT-PCR)] and existing signatures for the detection of active TB in s les obtained from in iduals with suspected active TB. Four substudies, each of which used s les from the biobank collected as part of the interferon gamma release assay (IGRA) in the Diagnostic Evaluation of Active TB study, which was a prospective cohort of patients recruited with suspected TB. Secondary care. Adults aged ≥ 16 years presenting as inpatients or outpatients at 12 NHS hospital trusts in London, Slough, Oxford, Leicester and Birmingham, with suspected active TB. New tests using genome-wide gene expression microarray (transcriptomics), surface-enhanced laser desorption ionisation time-of-flight mass spectrometry/liquid chromatography–mass spectrometry (proteomics), flow cytometry or qRT-PCR. Area under the curve (AUC), sensitivity and specificity were calculated to determine diagnostic accuracy. Positive and negative predictive values were calculated in some cases. A decision tree model was developed to calculate the incremental costs and quality-adjusted life-years of changing from current practice to using the novels tests. The project, and four substudies that assessed the previously published signatures, measured each of the new technologies and performed a health economic analysis in which the best-performing tests were evaluated for cost-effectiveness. The diagnostic accuracy of the transcriptomic tests ranged from an AUC of 0.81 to 0.84 for detecting all TB in our cohort. The performance for detecting culture-confirmed TB or pulmonary TB was better than for highly probable TB or extrapulmonary tuberculosis (EPTB), but was not high enough to be clinically useful. None of the previously described serum proteomic signatures for active TB provided good diagnostic accuracy, nor did the candidate rule-out tests. Four out of six previously described cellular immune signatures provided a reasonable level of diagnostic accuracy (AUC = 0.78–0.92) for discriminating all TB from those with other disease and latent TB infection in human immunodeficiency virus-negative TB suspects. Two of these assays may be useful in the IGRA-positive population and can provide high positive predictive value. None of the new tests for TB can be considered cost-effective. The diagnostic performance of new tests among the HIV-positive population was either underpowered or not sufficiently achieved in each substudy. Overall, the diagnostic performance of all previously identified ‘signatures’ of TB was lower than previously reported. This probably reflects the nature of the cohort we used, which includes the harder to diagnose groups, such as culture-unconfirmed TB or EPTB, which were under-represented in previous cohorts. We are yet to achieve our secondary objective of deriving novel signatures of TB using our data sets. This was beyond the scope of this report. We recommend that future studies using these technologies target specific subtypes of TB, specifically those groups for which new diagnostic tests are required. This project was funded by the Efficacy and Mechanism Evaluation (EME) programme, a MRC and NIHR partnership.
Publisher: Oxford University Press (OUP)
Date: 23-02-2008
DOI: 10.1093/BIOINFORMATICS/BTN071
Abstract: Motivation: Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically ≈ 10−7) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. Results: We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. Availability: The software can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: I.coin@imperial.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 2013
DOI: 10.1534/GENETICS.112.145599
Abstract: In recent years it has emerged that structural variants have a substantial impact on genomic variation. Inversion polymorphisms represent a significant class of structural variant, and despite the challenges in their detection, data on inversions in the human genome are increasing rapidly. Statistical methods for inferring parameters such as the recombination rate and the selection coefficient have generally been developed without accounting for the presence of inversions. Here we exploit new software for simulating inversions in population genetic data, invertFREGENE, to assess the potential impact of inversions on such methods. Using data simulated by invertFREGENE, as well as real data from several sources, we test whether large inversions have a disruptive effect on widely applied population genetics methods for inferring recombination rates, for detecting selection, and for controlling for population structure in genome-wide association studies (GWAS). We find that recombination rates estimated by LDhat are biased downward at inversion loci relative to the true contemporary recombination rates at the loci but that recombination hotspots are not falsely inferred at inversion breakpoints as may have been expected. We find that the integrated haplotype score (iHS) method for detecting selection appears robust to the presence of inversions. Finally, we observe a strong bias in the genome-wide results of principal components analysis (PCA), used to control for population structure in GWAS, in the presence of even a single large inversion, confirming the necessity to thin SNPs by linkage disequilibrium at large physical distances to obtain unbiased results.
Publisher: Proceedings of the National Academy of Sciences
Date: 03-08-2021
Publisher: Cold Spring Harbor Laboratory
Date: 22-06-2023
DOI: 10.1101/2023.06.15.23291261
Abstract: Direct metagenomic sequencing from positive blood culture (BC) broths, to identify bacteria and predict antimicrobial susceptibility, has been previously demonstrated using Illumina-based methods, but is relatively slow. We aimed to evaluate this approach using nanopore sequencing to provide more rapid results. Patients with suspected sepsis in 4 intensive care units were prospectively enrolled. Human-depleted DNA was extracted from positive BC broths and sequenced using nanopore (MinION). Species abundance was estimated using Kraken2, and a cloud-based artificial intelligence (AI) system (AREScloud) provided in silico antimicrobial susceptibility testing (AST) from assembled contigs. These results were compared to conventional identification and phenotypic AST. Genus-level agreement between conventional methods and metagenomic whole genome sequencing (MG-WGS) was 96.2% (50/52), but increased to 100% in monomicrobial infections. In total, 262 high quality AREScloud AST predictions across 24 s les were made, exhibiting categorical agreement (CA) of 89.3%, with major error (MA) and very major error (VME) rates of 10.5% and 12.1%, respectively. Over 90% CA was achieved for some taxa (e.g. Staphylococcus aureus ), but was suboptimal for Pseudomonas aeruginosa (CA 50%). In 470 AST predictions across 42 s les, with both high quality and exploratory-only predictions, overall CA, ME and VME rates were 87.7%, 8.3% and 28.4%. VME rates were inflated by false susceptibility calls in a small number of species / antibiotic combinations with few representative resistant isolates. Time to reporting from MG-WGS could be achieved within 8-16 hours from blood culture positivity. Direct metagenomic sequencing from positive BC broths is feasible and can provide accurate predictive AST for some species and antibiotics, but is sub-optimal for a subset of common pathogens, with unacceptably high VME rates. Nanopore-based approaches may be faster but improvements in accuracy are required before it can be considered for clinical use. New developments in nanopore sequencing technology, and training of AI algorithms on larger and more erse datasets may improve performance.
Publisher: Oxford University Press (OUP)
Date: 28-10-2018
DOI: 10.1093/BIOINFORMATICS/BTX691
Abstract: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridization process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridization and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimize the experiments. CapSim is publicly available under BSD license at github.com/Devika1/capsim. Supplementary data are available at Bioinformatics online.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 14-09-2022
DOI: 10.1126/SCITRANSLMED.ABJ2381
Abstract: Drug-resistant Gram-positive bacterial infections are still a substantial burden on the public health system, with two bacteria ( Staphylococcus aureus and Streptococcus pneumoniae ) accounting for over 1.5 million drug-resistant infections in the United States alone in 2017. In 2019, 250,000 deaths were attributed to these pathogens globally. We have developed a preclinical glycopeptide antibiotic, MCC5145, that has excellent potency (MIC 90 ≤ 0.06 μg/ml) against hundreds of isolates of methicillin-resistant S. aureus (MRSA) and other Gram-positive bacteria, with a greater than 1000-fold margin over mammalian cell cytotoxicity values. The antibiotic has therapeutic in vivo efficacy when dosed subcutaneously in multiple murine models of established bacterial infections, including thigh infection with MRSA and blood septicemia with S. pneumoniae , as well as when dosed orally in an antibiotic-induced Clostridioides difficile infection model. MCC5145 exhibited reduced nephrotoxicity at microbiologically active doses in mice compared to vancomycin. MCC5145 also showed improved activity against biofilms compared to vancomycin, both in vitro and in vivo, and a low propensity to select for drug resistance. Characterization of drug action using a transposon library bioinformatic platform showed a mechanistic distinction from other glycopeptide antibiotics.
Publisher: Elsevier BV
Date: 09-2022
Publisher: Public Library of Science (PLoS)
Date: 10-03-2011
Publisher: Springer Science and Business Media LLC
Date: 07-12-2008
DOI: 10.1038/NG.290
Publisher: Oxford University Press (OUP)
Date: 2004
DOI: 10.1093/NAR/GKH121
Publisher: Springer Science and Business Media LLC
Date: 20-02-2017
DOI: 10.1038/NCOMMS14515
Abstract: Third generation sequencing technologies provide the opportunity to improve genome assemblies by generating long reads spanning most repeat sequences. However, current analysis methods require substantial amounts of sequence data and computational resources to overcome the high error rates. Furthermore, they can only perform analysis after sequencing has completed, resulting in either over-sequencing, or in a low quality assembly due to under-sequencing. Here we present npScarf, which can scaffold and complete short read assemblies while the long read sequencing run is in progress. It reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained. In assembling four bacterial and one eukaryotic genomes, we show that npScarf can construct more complete and accurate assemblies while requiring less sequencing data and computational resources than existing methods. Our approach offers a time- and resource-effective strategy for completing short read assemblies.
Publisher: Springer Science and Business Media LLC
Date: 27-06-2019
DOI: 10.1038/S41523-019-0113-Y
Abstract: Invasive lobular carcinoma (ILC) is the most common special type of breast cancer, and is characterized by functional loss of E-cadherin, resulting in cellular adhesion defects. ILC typically present as estrogen receptor positive, grade 2 breast cancers, with a good short-term prognosis. Several large-scale molecular profiling studies have now dissected the unique genomics of ILC. We have undertaken an integrative analysis of gene expression and DNA copy number to identify novel drivers and prognostic biomarkers, using in-house ( n = 25), METABRIC ( n = 125) and TCGA ( n = 146) s les. Using in silico integrative analyses, a 194-gene set was derived that is highly prognostic in ILC ( P = 1.20 × 10 −5 )—we named this metagene ‘LobSig’. Assessing a 10-year follow-up period, LobSig outperformed the Nottingham Prognostic Index, PAM50 risk-of-recurrence (Prosigna), OncotypeDx, and Genomic Grade Index (MapQuantDx) in a stepwise, multivariate Cox proportional hazards model, particularly in grade 2 ILC cases ( χ 2 , P = 9.0 × 10 −6 ), which are difficult to prognosticate clinically. Importantly, LobSig status predicted outcome with 94.6% accuracy amongst cases classified as ‘moderate-risk’ according to Nottingham Prognostic Index in the METABRIC cohort. Network analysis identified few candidate pathways, though genesets related to proliferation were identified, and a LobSig-high phenotype was associated with the TCGA proliferative subtype ( χ 2 , P 8.86 × 10 −4 ). ILC with a poor outcome as predicted by LobSig were enriched with mutations in ERBB2 , ERBB3 , TP53 , AKT1 and ROS1 . LobSig has the potential to be a clinically relevant prognostic signature and warrants further development.
Publisher: Oxford University Press (OUP)
Date: 05-2019
Publisher: Cold Spring Harbor Laboratory
Date: 19-09-2023
Publisher: Elsevier BV
Date: 05-2016
Publisher: American Medical Association (AMA)
Date: 18-04-2017
Publisher: Microbiology Society
Date: 07-2018
Publisher: Springer Science and Business Media LLC
Date: 2012
Publisher: Springer Science and Business Media LLC
Date: 14-12-2008
DOI: 10.1038/NG.287
Publisher: Cold Spring Harbor Laboratory
Date: 10-01-2018
DOI: 10.1101/246108
Abstract: Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between in iduals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. We used a PacBio long-read sequenced s le to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.
Publisher: IEEE
Date: 12-2008
Publisher: Springer Science and Business Media LLC
Date: 17-06-2019
Publisher: IEEE
Date: 07-2018
Publisher: Oxford University Press (OUP)
Date: 25-04-2014
DOI: 10.1093/HMG/DDU150
Publisher: Frontiers Media SA
Date: 06-04-2022
DOI: 10.3389/FIMMU.2022.832223
Abstract: Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) using Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analyzed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~101 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2 , which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.
Publisher: Cold Spring Harbor Laboratory
Date: 18-02-2020
DOI: 10.1101/2020.02.17.953539
Abstract: A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph , a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at snguyen/assembly .
Publisher: BMJ
Date: 05-03-2013
Abstract: To determine whether four key neuropsychiatric and sleep related features associated with Parkinson's disease (PD) are associated with the motor handicap and demographic data. The growing number of recognised non-motor features of PD makes routine screening of all these symptoms impractical. Here, we investigated the hypothesis that standard demographic data and the routine assessment of motor signs is associated with the presence of dementia, psychosis, clinically probable rapid eye movement (REM) sleep behavior disorder (cpRBD) and restless legs syndrome (RLS). 775 patients with PD underwent standardised assessment of motor features and the presence of dementia, psychosis, cpRBD and RLS. A stepwise feature elimination procedure with fitted logistic regression models was applied to identify which/if any combination of demographic and motor factors is associated with each of the four studied non-motor features. A within-study out-of-s le estimate of the power of the predicted values of the models was calculated using standard evaluation procedures. Age and Hoehn&Yahr (H&Y) stage were strongly associated with the presence of dementia (p value<0.001 for both factors in the final selected model) while a combination of age, disease duration, H&Y stage, dopamine agonists and catechol-O-methyltransferase (COMT) inhibitors was associated with the presence of psychosis. Disease duration and H&Y stage were the significant indicators of cpRBD, and the lack of significant motor asymmetry was the only significant feature associated with RLS-type symptoms but the evidence of association was weak. Demographic and motor features routinely collected in patients with PD can estimate the occurrence of neuropsychiatric and sleep-related features of PD.
Publisher: Elsevier BV
Date: 2021
DOI: 10.2139/SSRN.3766495
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2021
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 04-2011
Publisher: Springer Science and Business Media LLC
Date: 06-05-2022
DOI: 10.1186/S12879-022-07414-2
Abstract: Molecular mechanisms determining the transmission and prevalence of drug resistant tuberculosis (DR-TB) in Papua New Guinea (PNG) are poorly understood. We used genomic and drug susceptibility data to explore the evolutionary history, temporal acquisition of resistance and transmission dynamics of DR-TB across PNG. We performed whole genome sequencing on isolates from Central Public Health Laboratory, PNG, collected 2017–2019. Data analysis was done on a composite dataset that also included 100 genomes previously sequenced from Daru, PNG (2012–2015). S led isolates represented 14 of the 22 PNG provinces, the majority (66/94 70%) came from the National Capital District (NCD). In the composite dataset, 91% of strains were Beijing 2.2.1.1, identified in 13 provinces. Phylogenetic tree of Beijing strains revealed two clades, Daru dominant clade (A) and NCD dominant clade (B). Multi-drug resistance (MDR) was repeatedly and independently acquired, with the first MDR cases in both clades noted to have emerged in the early 1990s, while fluoroquinolone resistance emerged in 2009 (95% highest posterior density 2000–2016). We identified the presence of a frameshift mutation within Rv0678 (p.Asp47fs) which has been suggested to confer resistance to bedaquiline, despite no known exposure to the drug. Overall genomic clustering was significantly associated with rpoC compensatory and inhA promoter mutations (p 0.001), with high percentage of most genomic clusters (12/14) identified in NCD, reflecting its role as a potential national lifier. The acquisition and evolution of drug resistance among the major clades of Beijing strain threaten the success of DR-TB treatment in PNG. With continued transmission of this strain in PNG, genotypic drug resistance surveillance using whole genome sequencing is essential for improved public health response to outbreaks. With occurrence of resistance to newer drugs such as bedaquiline, knowledge of full drug resistance profiles will be important for optimal treatment selection.
Publisher: Cold Spring Harbor Laboratory
Date: 16-06-2017
DOI: 10.1101/150516
Abstract: Early childhood growth patterns are associated with adult metabolic health, but the underlying mechanisms are unclear. We performed genome-wide meta-analyses and follow-up in up to 22,769 European children for six early growth phenotypes derived from longitudinal data: peak height and weight velocities, age and body mass index (BMI) at adiposity peak (AP ~ 9 months) and rebound (AR ~ 5-6 years). We identified four associated loci ( P 5x10 −8 ): LEPR/LEPROT with BMI at AP, FTO and TFAP2B with Age at AR and GNPDA2 with BMI at AR. The observed AR-associated SNPs at FTO, TFAP2B and GNPDA2 represent known adult BMI-associated variants. The common BMI at AP associated variant at LEPR/LEPROT was not associated with adult BMI but was associated with LEPROT gene expression levels, especially in subcutaneous fat ( P x10 −51 ). We identify strong positive genetic correlations between early growth and later adiposity traits, and analysis of the full discovery stage results for Age at AR revealed enrichment for insulin-like growth factor 1 (IGF-1) signaling and apolipoprotein pathways. This genome-wide association study suggests mechanistic links between early childhood growth and adiposity in later childhood and adulthood, highlighting these early growth phenotypes as potential targets for the prevention of obesity.
Publisher: American Society for Microbiology
Date: 30-06-2016
Abstract: Klebsiella quasipneumoniae subsp. similipneumoniae strain ATCC 700603, formerly known as K. pneumoniae K6, is known for producing extended-spectrum β-lactamase (ESBL) enzymes that can hydrolyze oxyimino-β-lactams, resulting in resistance to these drugs. We herein report the complete genome of strain ATCC 700603 and show that the ESBL genes are plasmid-encoded.
Publisher: Oxford University Press (OUP)
Date: 02-2020
DOI: 10.1093/GIGASCIENCE/GIAA002
Abstract: Klebsiella pneumoniae frequently harbours multidrug resistance, and current diagnostics struggle to rapidly identify appropriate antibiotics to treat these bacterial infections. The MinION device can sequence native DNA and RNA in real time, providing an opportunity to compare the utility of DNA and RNA for prediction of antibiotic susceptibility. However, the effectiveness of bacterial direct RNA sequencing and base-calling has not previously been investigated. This study interrogated the genome and transcriptome of 4 extensively drug-resistant (XDR) K. pneumoniae clinical isolates however, further antimicrobial susceptibility testing identified 3 isolates as pandrug-resistant (PDR). The majority of acquired resistance (≥75%) resided on plasmids including several megaplasmids (≥100 kb). DNA sequencing detected most resistance genes (≥70%) within 2 hours of sequencing. Neural network–based base-calling of direct RNA achieved up to 86% identity rate, although ≤23% of reads could be aligned. Direct RNA sequencing (with ∼6 times slower pore translocation) was able to identify (within 10 hours) ≥35% of resistance genes, including those associated with resistance to aminoglycosides, β-lactams, trimethoprim, and sulphonamide and also quinolones, rif icin, fosfomycin, and phenicol in some isolates. Direct RNA sequencing also identified the presence of operons containing up to 3 resistance genes. Polymyxin-resistant isolates showed a heightened transcription of phoPQ (≥2-fold) and the pmrHFIJKLM operon (≥8-fold). Expression levels estimated from direct RNA sequencing displayed strong correlation (Pearson: 0.86) compared to quantitative real-time PCR across 11 resistance genes. Overall, MinION sequencing rapidly detected the XDR/PDR K. pneumoniae resistome, and direct RNA sequencing provided accurate estimation of expression levels of these genes.
Publisher: Springer New York
Date: 2014
Publisher: Springer Science and Business Media LLC
Date: 2013
DOI: 10.1186/CC12609
Publisher: Cold Spring Harbor Laboratory
Date: 08-05-2017
DOI: 10.1101/134684
Abstract: Extensively drug-resistant Klebsiella pneumoniae (XDR-KP) infections cause high mortality and are disseminating globally. Identifying the genetic basis underpinning resistance allows for rapid diagnosis and treatment. XDR isolates sourced from Greece and Brazil, including nineteen polymyxin-resistant and five polymyxin-susceptible strains, underwent whole genome sequencing. Approximately 90% of polymyxin resistance was enabled by alterations upstream or within mgrB . The most common mutation identified was an insertion at nucleotide position 75 in mgrB via an ISK pn26 -like element in the ST258 lineage and ISK pn13 in one ST11 isolate. Three strains acquired an IS1 element upstream of mgrB and another strain had an ISK pn25 insertion at 133 bp. Other isolates had truncations (C28STOP, Q30STOP) or a missense mutation (D31E) affecting mgrB . Complementation assays revealed all mgrB perturbations contributed to resistance. Missense mutations in phoQ (T281M, G385C) were also found to facilitate resistance. Several variants in phoPQ co-segregating with the ISKpn26-like insertion were identified as potential partial suppressor mutations. Three ST258 s les were found to contain subpopulations with different resistance conferring mutations, including the ISKpn26-like insertion colonising with a novel mutation in pmrB (P158R), both confirmed via complementation assays. We also characterized a new multi-drug resistant Klebsiella quasipneumoniae strain ST2401 which was susceptible to polymyxins. These findings highlight the broad spectrum of chromosomal modifications which can facilitate and regulate resistance against polymyxins in K. pneumoniae . Whole genome sequencing of the 24 clinical isolates has been deposited under BioProject PRJNA307517 ( ioproject/PRJNA307517 ). Klebsiella pneumoniae contributes to a high abundance of nosocomial infections and the rapid emergence of antimicrobial resistance hinders treatment. Polymyxins are predominantly utilized to treat multidrug-resistant infections, however, resistance to the polymyxins is arising. This increasing prevalence in polymyxin resistance is evident especially in Greece and Brazil. Identifying the genomic variations conferring resistance in clinical isolates from these regions assists with potentially detecting novel alterations and tracing the spread of particular strains. This study commonly found mutations in the gene mgrB , the negative regulator of PhoPQ, known to cause resistance in KP. In the remaining isolates, missense mutations in phoQ were accountable for resistance. Multiple novel mutations were detected to be segregating with mgrB perturbations. This was either due to a mixed heterogeneous s le of two polymyxin-resistant strains, or because of multiple mutations within the same strain. Of interest was the validation of novel mutations in phoPQ segregating with a previously known ISK pn26 -like element in disrupted mgrB isolates. Complementation of these phoPQ mutations revealed a reduction in minimum inhibitory concentrations and suggests the first evidence of partial suppressor mutations in KP. This research builds upon our current understanding of heteroresistance, lineage specific mutations and regulatory variations relating to polymyxin resistance.
Publisher: Elsevier BV
Date: 06-2022
DOI: 10.1016/J.SCITOTENV.2022.154043
Abstract: Fishpond sediments are rich in organic carbon and nutrients thus, they can be used as potential fertilizers and soil conditioners. However, sediments can be contaminated with toxic elements (TEs), which have to be immobilized to allow sediment reutilization. Addition of biochars (BCs) to contaminated sediments may enhance their nutrient content and stabilize TEs, which valorize its reutilization. Consequently, this study evaluated the performance of BCs derived from Taraxacum mongolicum Hand-Mazz (TMBC), Tribulus terrestris (TTBC), and rice straw (RSBC) for Cu, Cr, and Zn stabilization and for the enhancement of nutrient content in the fishpond sediments from San Jiang (SJ) and Tan Niu (TN), China. All BCs, particularly TMBC, reduced significantly the average concentrations of Cr, Cu, and Zn in the overlying water (up to 51% for Cr, 71% for Cu, and 68% for Zn) and in the sediments pore water (up to 77% for Cr, 76% for Cu, and 50% for Zn), and also reduced metal leachability (up to 47% for Cr, 60% for Cu, and 62% for Zn), as compared to the control. The acid soluble fraction accounted for the highest portion of the total content of Cr (43-44%), Cu (38-43%), and Zn (42-45%), followed by the reducible, oxidizable, and the residual fraction this indicates the high potential risk. As compared with the control, TMBC was more effective in reducing the average concentrations of the acid soluble Cr (15-22%), Cu (35-53%), and Zn (21-39%). Added BCs altered the metals acid soluble fraction by shifting it to the oxidizable and residual fractions. Moreover, TMBC improved the macronutrient status in both sediments. This work provides a pathway for TEs remediation of sediments and gives novel insights into the utilization of BC-treated fishpond sediments as fertilizers for crop production.
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Microbiology Society
Date: 08-08-2023
Abstract: Tuberculosis is a global pandemic disease with a rising burden of antimicrobial resistance. As a result, the World Health Organization (WHO) has a goal of enabling universal access to drug susceptibility testing (DST). Given the slowness of and infrastructure requirements for phenotypic DST, whole-genome sequencing, followed by genotype-based prediction of DST, now provides a route to achieving this. Since a central component of genotypic DST is to detect the presence of any known resistance-causing mutations, a natural approach is to use a reference graph that allows encoding of known variation. We have developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44 709 and 138 publicly available Illumina and Nanopore s les with associated phenotypes. We find that DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe with Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis – including deletion of genes katG and pncA – and suggest mutations that may warrant reclassification as associated with resistance.
Publisher: Elsevier BV
Date: 04-2011
Abstract: High birth weight is associated with adult body mass index (BMI). We hypothesized that birth weight and BMI may partly share a common genetic background. The objective was to examine the associations of 12 established BMI variants in or near the NEGR1, SEC16B, TMEM18, ETV5, GNPDA2, BDNF, MTCH2, BCDIN3D, SH2B1, FTO, MC4R, and KCTD15 genes and their additive score with birth weight. A meta-analysis was conducted with the use of 1) the European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk, Hertfordshire, Fenland, and European Youth Heart Study cohorts (n(max) = 14,060) 2) data extracted from the Early Growth Genetics Consortium meta-analysis of 6 genome-wide association studies for birth weight (n(max) = 10,623) and 3) all published data (n(max) = 14,837). Only the MTCH2 and FTO loci showed a nominally significant association with birth weight. The BMI-increasing allele of the MTCH2 variant (rs10838738) was associated with a lower birth weight (β ± SE: -13 ± 5 g/allele P = 0.012 n = 23,680), and the BMI-increasing allele of the FTO variant (rs1121980) was associated with a higher birth weight (β ± SE: 11 ± 4 g/allele P = 0.013 n = 28,219). These results were not significant after correction for multiple testing. Obesity-susceptibility loci have a small or no effect on weight at birth. Some evidence of an association was found for the MTCH2 and FTO loci, ie, lower and higher birth weight, respectively. These findings may provide new insights into the underlying mechanisms by which these loci confer an increased risk of obesity.
Publisher: BMJ
Date: 06-2020
Abstract: Analysis of vector integration sites in gene-modified cells can provide critical information on clonality and potential biological impact on nearby genes. Current short-read next-generation sequencing methods require specialized instruments and large batch runs. We used nanopore sequencing to analyze the vector integration sites of T cells transduced by the gammaretroviral vector, SFG.iCasp9.2A.ΔCD19. DNA from oligoclonal cell lines and polyclonal clinical s les were restriction enzyme digested with two 6-cutters, NcoI and BspHI and the flanking genomic DNA lified by inverse PCR or cassette ligation PCR. Following nested PCR and barcoding, the licons were sequenced on the Oxford Nanopore platform. Reads were filtered for quality, trimmed, and aligned. Custom tool was developed to cluster reads and merge overlapping clusters. Both inverse PCR and cassette ligation PCR could successfully lify flanking genomic DNA, with cassette ligation PCR showing less bias. The 4.8 million raw reads were grouped into 12,186 clusters and 6410 clones. The 3′long terminal repeat (LTR)-genome junction could be resolved within a 5-nucleotide span for a majority of clusters and within one nucleotide span for clusters with ≥5 reads. The chromosomal distributions of the insertional sites and their predilection for regions proximate to transcription start sites were consistent with previous reports for gammaretroviral vector integrants as analyzed by short-read next-generation sequencing. Our study shows that it is feasible to use nanopore sequencing to map polyclonal vector integration sites. The assay is scalable and requires minimum capital, which together enable cost-effective and timely analysis. Further refinement is required to reduce lification bias and improve single nucleotide resolution.
Publisher: Springer Science and Business Media LLC
Date: 11-12-1994
DOI: 10.1038/NATURE15393
Publisher: Cold Spring Harbor Laboratory
Date: 04-05-2023
DOI: 10.1101/2023.05.04.539481
Abstract: 2. The dominant paradigm for analysing genetic variation relies on a central idea: all genomes in a species can be described as minor differences from a single reference genome. However, this approach can be problematic or inadequate for bacteria, where there can be significant sequence ergence within a species. Reference graphs are an emerging solution to the reference bias issues implicit in the “single-reference” model. Such a graph represents variation at multiple scales within a population – e.g., nucleotide- and locus-level. The genetic causes of drug resistance in bacteria have proven comparatively easy to decode compared with studies of human diseases. For ex le, it is possible to predict resistance to numerous anti-tuberculosis drugs by simply testing for the presence of a list of single nucleotide polymorphisms and insertion/deletions, commonly referred to as a catalogue. We developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph, a process that can be replicated for other species. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44,709 and 138 publicly available Illumina and Nanopore s les with associated phenotypes. We find DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe on Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis - including deletion of genes katG and pncA – and suggest mutations that may warrant reclassification as associated with resistance. 3. Mycobacterium tuberculosis is the bacterium responsible for tuberculosis (TB). TB is one of the leading causes of death worldwide before the coronavirus pandemic it was the leading cause of death from a single pathogen. Drug-resistant TB incidence has recently increased, making the detection of resistance even more vital. In this study, we develop a new software tool to predict drug resistance from whole-genome sequence data of the pathogen using new reference graph models to represent a reference genome. We evaluate it on M. tuberculosis against existing tools for resistance prediction and show improved performance. Using our method, we discover new resistance-associated variations and discuss reclassification of a selection of existing mutations. As such, this work contributes to TB drug resistance diagnostic efforts. In addition, the method could be applied to any bacterial species, so is of interest to anyone working on antimicrobial resistance. 4. The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files . The software method presented in this work, DrPRG, is freely available from GitHub under an MIT license at bhall88/drprg . We used commit 9492f25 for all results via a Singularity[1] container from the URI docker://quay.io/mbhall88/drprg:9492f25 . All code used to generate results for this study are available on GitHub at bhall88/drprg-paper . All data used in this work are freely available from the SRA/ENA/DRA and a copy of the datasheet with all associated phenotype information can be downloaded from the archived repository at 0.5281/zenodo.7819984 or found in the previously mentioned GitHub repository. The Mycobacterium tuberculosis index used in this work is available to download through DrPRG via the command drprg index --download mtb@20230308 or from GitHub at bhall88/drprg-index .
Publisher: Springer Science and Business Media LLC
Date: 30-10-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2020
Publisher: Elsevier BV
Date: 09-2018
DOI: 10.1016/J.ENVINT.2018.06.004
Abstract: Antibiotic resistance poses a major threat to public health. Overuse and misuse of antibiotics are generally recognized as the key factors contributing to antibiotic resistance. However, whether non-antibiotic, anti-microbial (NAAM) chemicals can directly induce antibiotic resistance is unclear. We aim to investigate whether the exposure to a NAAM chemical triclosan (TCS) has an impact on inducing antibiotic resistance on Escherichia coli. Here, we report that at a concentration of 0.2 mg/L TCS induces multi-drug resistance in wild-type Escherichia coli after 30-day TCS exposure. The oxidative stress induced by TCS caused genetic mutations in genes such as fabI, frdD, marR, acrR and soxR, and subsequent up-regulation of the transcription of genes encoding beta-lactamases and multi-drug efflux pumps, together with down-regulation of genes related to membrane permeability. The findings advance our understanding of the potential role of NAAM chemicals in the dissemination of antibiotic resistance in microbes, and highlight the need for controlling biocide applications.
Publisher: Oxford University Press (OUP)
Date: 22-08-2014
DOI: 10.1093/BIOINFORMATICS/BTU475
Abstract: Motivation: Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions. Results: cnvOffSeq was benchmarked on whole-exome sequencing s les from the 1000 Genomes Project. In a set of 104 gold standard intergenic deletions, our method achieved a sensitivity of 57.5% and a specificity of 99.2%, while maintaining a low FDR of 5%. For gold standard deletions longer than 5 kb, cnvOffSeq achieves a sensitivity of 90.4% without increasing the FDR. cnvOffSeq outperforms both whole-genome and whole-exome CNV detection methods considerably and is shown to offer a substantial improvement over naïve local SVD. Availability and Implementation: cnvOffSeq is available at /cnvoffseq/ Contact: evangelos.bellos09@imperial.ac.uk or l.coin@imb.uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Public Library of Science (PLoS)
Date: 30-11-2009
Publisher: Springer Science and Business Media LLC
Date: 15-04-2012
DOI: 10.1038/NG.2245
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Elsevier BV
Date: 04-2011
Publisher: Springer International Publishing
Date: 2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2010
Publisher: American Medical Association (AMA)
Date: 07-2009
Publisher: Cold Spring Harbor Laboratory
Date: 15-12-2021
DOI: 10.1101/2021.12.14.472725
Abstract: Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) combined with the Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analysed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~136 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2 , which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.
Publisher: Elsevier BV
Date: 07-2014
Publisher: Springer Science and Business Media LLC
Date: 2012
Publisher: Public Library of Science (PLoS)
Date: 06-03-2009
Publisher: Cold Spring Harbor Laboratory
Date: 06-03-2023
DOI: 10.1101/2023.03.04.531075
Abstract: Assessing the impact of SARS-CoV-2 variants on the host is crucial with continuous emergence of new variants. We employed single-cell sequencing to investigate host transcriptomic response to ancestral and Alpha-strain SARS-CoV-2 infections within air-liquid-interface human nasal epithelial cells from adults and adolescents. Strong innate immune responses were observed across lowly-infected and bystander cell-types, and heightened in Alpha-infection. Contrastingly, the innate immune response of highly-infected cells was like mock-control cells. Alpha highly-infected cells showed increased expression of protein refolding genes compared with ancestral-strain-infected adolescent cells. Oxidative phosphorylation- and translation-related genes were down-regulated in bystander cells versus infected and mock-control cells, suggesting that the down-regulation is protective and up-regulation supports viral activity. Infected adult cells revealed up-regulation of these pathways compared with infected adolescents, implying enhanced pro-viral states in infected adults. Overall, this highlights the complexity of cell-type-, age- and viral-strain-dependent host epithelial responses to SARS-CoV-2 and the value of air-liquid-interface cultures.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2022
Publisher: Public Library of Science (PLoS)
Date: 20-01-2021
DOI: 10.1371/JOURNAL.PCBI.1008586
Abstract: A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph , a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at snguyen/assembly .
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2014
Publisher: Springer Science and Business Media LLC
Date: 27-11-2019
DOI: 10.1038/S41598-019-53721-1
Abstract: Fever is the most common reason that children present to Emergency Departments. Clinical signs and symptoms suggestive of bacterial infection are often non-specific, and there is no definitive test for the accurate diagnosis of infection. The ‘omics’ approaches to identifying biomarkers from the host-response to bacterial infection are promising. In this study, lipidomic analysis was carried out with plasma s les obtained from febrile children with confirmed bacterial infection (n = 20) and confirmed viral infection (n = 20). We show for the first time that bacterial and viral infection produces distinct profile in the host lipidome. Some species of glycerophosphoinositol, sphingomyelin, lysophosphatidylcholine and cholesterol sulfate were higher in the confirmed virus infected group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine, lactosylceramide and bilirubin were lower in the confirmed virus infected group when compared with confirmed bacterial infected group. A combination of three lipids achieved an area under the receiver operating characteristic (ROC) curve of 0.911 (95% CI 0.81 to 0.98). This pilot study demonstrates the potential of metabolic biomarkers to assist clinicians in distinguishing bacterial from viral infection in febrile children, to facilitate effective clinical management and to the limit inappropriate use of antibiotics.
Publisher: Cold Spring Harbor Laboratory
Date: 22-05-2016
DOI: 10.1101/054783
Abstract: Genome assemblies obtained from short read sequencing technologies are often fragmented into many contigs because of the abundance of repetitive sequences. Long read sequencing technologies allow the generation of reads spanning most repeat sequences, providing the opportunity to complete these genome assemblies. However, substantial amounts of sequence data and computational resources are required to overcome the high per-base error rate inherent to these technologies. Furthermore, most existing methods only assemble the genomes after sequencing has completed which could result in either generation of more sequence data at greater cost than required or a low-quality assembly if insufficient data are generated. Here we present the first computational method which utilises real-time nanopore sequencing to scaffold and complete short-read assemblies while the long read sequence data is being generated. The method reports the progress of completing the assembly in real-time so users can terminate the sequencing once an assembly of sufficient quality and completeness is obtained. We use our method to complete four bacterial genomes and one eukaryotic genome, and show that it is able to construct more complete and more accurate assemblies, and at the same time, requires less sequencing data and computational resources than existing pipelines. We also demonstrate that the method can facilitate real-time analyses of positional information such as identification of bacterial genes encoded in plasmids and pathogenicity islands.
Publisher: Springer International Publishing
Date: 2018
Publisher: Elsevier BV
Date: 04-2023
Publisher: Elsevier BV
Date: 04-2015
Publisher: Springer Science and Business Media LLC
Date: 08-10-2019
DOI: 10.1007/S00122-019-03437-7
Abstract: β-Carotene content in sweetpotato is associated with the Orange and phytoene synthase genes due to physical linkage of phytoene synthase with sucrose synthase , β-carotene and starch content are negatively correlated. In populations depending on sweetpotato for food security, starch is an important source of calories, while β-carotene is an important source of provitamin A. The negative association between the two traits contributes to the low nutritional quality of sweetpotato consumed, especially in sub-Saharan Africa. Using a biparental mapping population of 315 F 1 progeny generated from a cross between an orange-fleshed and a non-orange-fleshed sweetpotato variety, we identified two major quantitative trait loci (QTL) on linkage group (LG) three (LG3) and twelve (LG12) affecting starch, β-carotene, and their correlated traits, dry matter and flesh color. Analysis of parental haplotypes indicated that these two regions acted pleiotropically to reduce starch content and increase β-carotene in genotypes carrying the orange-fleshed parental haplotype at the LG3 locus. Phytoene synthase and sucrose synthase, the rate-limiting and linked genes located within the QTL on LG3 involved in the carotenoid and starch biosynthesis, respectively, were differentially expressed in Beauregard versus Tanzania storage roots. The Orange gene, the molecular switch for chromoplast biogenesis, located within the QTL on LG12 while not differentially expressed was expressed in developing roots of the parental genotypes. We conclude that these two QTL regions act together in a cis and trans manner to inhibit starch biosynthesis in amyloplasts and enhance chromoplast biogenesis, carotenoid biosynthesis, and accumulation in orange-fleshed sweetpotato. Understanding the genetic basis of this negative association between starch and β-carotene will inform future sweetpotato breeding strategies targeting sweetpotato for food and nutritional security.
Publisher: Association for Computing Machinery (ACM)
Date: 2012
Abstract: Service-centric solutions usually require rich context to fully deliver and better reflect on the underlying applications. We present a novel use of context in the form of customized user interface services with the concept of User Interface as a Service (UIaaS). UIaaS takes user profiles as input to generate context-aware interface services. Such interface services can be used as context to augment semantic services with contextual information leading to UIaaS as a Context (UIaaSaaC). The added serendipitous benefit of the proposed concept is that the composition of a customized user interface with the requested service is performed by the service composition engine, as is the case with any other services. We use a special-purpose language (called User Interface Description Language (UIDL)) to model and realize user interfaces as services. We use a real-life e-government application, human services delivery for the citizens, as a proof-of-concept. We also present a comprehensive evaluation of the proposed approach using a functional evaluation and a nonfunctional evaluation consisting of an end user usability test and expert usability reviews.
Publisher: Cold Spring Harbor Laboratory
Date: 22-03-2017
DOI: 10.1101/119271
Abstract: The assembly of whole-chromosome pseudomolecules for plant genomes remains challenging due to polyploidy and high repeat content. We developed an approach for constructing complete pseudomolecules for polyploid species using genotyping-by-sequencing data from outcrossing mapping populations coupled with high coverage whole genome sequence data of a reference genome. Our approach combines de novo assembly with linkage mapping to arrange scaffolds into pseudomolecules. We show that the method is able to reconstruct simulated chromosomes for both diploid and tetraploid genomes. Comparisons to three existing genetic mapping tools show that our method outperforms the other methods in accuracy on both grouping and ordering, and is robust to the presence of substantial amounts of missing data and genotyping errors. We applied our method to three real datasets including a diploid Ipomoea trifida and two tetraploid potato mapping populations. The linkage maps show significant concordance with the reference chromosomes. We resolved seven assembly errors for the published Ipomoea trifida genome assembly as well as anchored an unplaced scaffold in the published potato genome.
Publisher: Cold Spring Harbor Laboratory
Date: 25-09-2017
DOI: 10.1101/193631
Abstract: The pathogenesis of severe Plasmodium falciparum malaria is incompletely understood. Since the pathogenic stage of the parasite is restricted to blood, dual RNA-sequencing of host and parasite transcripts in blood can reveal their interactions at a systemic scale. Here we identify human and parasite gene expression associated with severe disease features in Gambian children. Differences in parasite load explained up to 99% of differential expression of human genes but only a third of the differential expression of parasite genes. Co-expression analyses showed a remarkable co-regulation of host and parasite genes controlling translation, and host granulopoiesis genes uniquely co-regulated and differentially expressed in severe malaria. Our results indicate that high parasite load is the proximal stimulus for severe P. falciparum malaria, that there is an unappreciated role for many parasite genes in determining virulence, and hint at a molecular arms-race between host and parasite to synthesise protein products.
Publisher: Springer Science and Business Media LLC
Date: 02-2010
DOI: 10.1038/NATURE08727
Publisher: Springer Science and Business Media LLC
Date: 25-04-2016
DOI: 10.1038/NG.3559
Publisher: MDPI AG
Date: 09-02-2023
DOI: 10.3390/PATHOGENS12020289
Abstract: Vascular wilt caused by the ascomycete fungal pathogen Fusarium oxysporum f. sp. cubense (Foc) is a major constraint of banana production around the world. The virulent race, namely Tropical Race 4, can infect all Cavendish-type banana plants and is now widespread across the globe, causing devastating losses to global banana production. In this study, we characterized Foc Subtropical Race 4 (STR4) resistance in a wild banana relative which, through estimated genome size and ancestry analysis, was confirmed to be Musa acuminata ssp. malaccensis. Using a self-derived F2 population segregating for STR4 resistance, quantitative trait loci sequencing (QTL-seq) was performed on bulks consisting of resistant and susceptible in iduals. Changes in SNP index between the bulks revealed a major QTL located on the distal end of the long arm of chromosome 3. Multiple resistance genes are present in this region. Identification of chromosome regions conferring resistance to Foc can facilitate marker assisted selection in breeding programs and paves the way towards identifying genes underpinning resistance.
Publisher: Cold Spring Harbor Laboratory
Date: 06-02-2022
DOI: 10.1101/2022.02.05.479210
Abstract: Genomic neighbor typing enables heuristic inference of bacterial lineages and phenotypes from nanopore sequencing data. However, small reference databases may not be sufficiently representative of the ersity of lineages and genotypes present in a collection of isolates. In this study, we explore the use of genomic neighbor typing for surveillance of community-associated Staphylococcus aureus outbreaks in Papua New Guinea (PNG) and Far North Queensland, Australia (FNQ). We developed Sketchy , an implementation of genomic neighbor typing that queries exhaustive whole genome reference databases using MinHash. Evaluations were conducted using nanopore read simulations and six species-wide reference sketches (4832 - 47616 genomes), as well as two S. aureus outbreak data sets sequenced at low depth using a sequential multiplex library protocol on the MinION (n = 160, with matching Illumina data). Heuristic inference of lineages and antimicrobial resistance profiles allowed us to conduct multiplex genotyping in situ at the Papua New Guinea Institute of Medical Research in Goroka, on low-throughput Flongle adapters and using multiple successive libraries on the same MinION flow cell (n = 24 - 48). Comparison to phylogenetically informed genomic neighbor typing with RASE on the dominant outbreak sequence type suggests slightly better performance at predicting lineage-scale genotypes using large sketch sizes, but inferior performance in resolving clade-specific genotypes (methicillin resistance). Sketchy can be used for large-scale bacterial outbreak surveillance and in challenging sequencing scenarios, but improvements to clade-specific genotype inference are needed for diagnostic applications. Sketchy is available open-source at: steinig/sketchy
Publisher: Springer International Publishing
Date: 2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2013
DOI: 10.1109/TSC.2012.7
Publisher: Springer Science and Business Media LLC
Date: 10-05-2009
DOI: 10.1038/NG.361
Publisher: ACM
Date: 11-09-2022
Publisher: Elsevier BV
Date: 09-2019
DOI: 10.1016/J.HUMIMM.2019.04.020
Abstract: Kawasaki disease (KD) is a pediatric vasculitis caused by an unknown trigger in genetically susceptible children. The incidence varies widely across genetically erse populations. Several associations with HLA Class I alleles have been reported in single cohort studies. Using a genetic approach, from the nine single nucleotide variants (SNVs) associated with KD susceptibility in children of European descent, we identified SNVs near the HLA-C (rs6906846) and HLA-B genes (rs2254556) whose association was replicated in a Japanese descent cohort (rs6906846 p = 0.01, rs2254556 p = 0.005). The risk allele (A at rs6906846) was also associated with HLA-C*07:02 and HLA-C*04:01 in both US multi-ethnic and Japanese cohorts and HLA-C*12:02 only in the Japanese cohort. The risk A-allele was associated with eight non-conservative amino acid substitutions (amino acid positions) Asp or Ser (9), Arg (14), Ala (49), Ala (73), Ala (90), Arg (97), Phe or Ser (99), and Phe or Ser (116) in the HLA-C peptide binding groove that binds peptides for presentation to cytotoxic T cells (CTL). This raises the possibility of increased affinity to a "KD peptide" that contributes to the vasculitis of KD in genetically susceptible children.
Publisher: Elsevier BV
Date: 09-2022
Publisher: Elsevier BV
Date: 04-2022
DOI: 10.1016/J.ENVPOL.2022.118877
Abstract: The effects of catalytic hydrothermal (HT) pretreatment on animal manure followed by the addition of hydrochar on the nutrients recovery have not yet been investigated using a combination of chemical, microscopic, and spectroscopic techniques. Therefore, a catalytic HT process was employed to pretreat swine manure without additives (manure-HT) and with H
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2023
Publisher: Cold Spring Harbor Laboratory
Date: 07-11-2019
DOI: 10.1101/833897
Abstract: Vector integration site analysis can be important in the follow-up of patients who received gene-modified cells, but current platforms based on next-generation sequencing are expensive and relatively inaccessible. We analyzed polyclonal T cells transduced by a gammaretroviral vector, SFG.iCasp9.2A.ΔCD19, from a clinical trial. Following restriction enzyme digestion, the unknown flanking genomic sequences were lified by inverse polymerase chain reaction (PCR) or cassette ligation PCR. Nanopore sequencing could identify thousands of unique integration sites within polyclonal s les, with cassette ligation PCR showing less bias. The assay is scalable and requires minimum capital, which together enable cost-effective and timely analysis.
Publisher: PeerJ
Date: 06-03-2018
DOI: 10.7287/PEERJ.PREPRINTS.26624V1
Abstract: There are substantial subtelomeric interstitial telomeric sequence (ITS) in the human genome, however the origin of these sequences is not well understood. We investigate the possibility that these ITS have arisen via a process of chromosome end extension to the telomere sequence. By analysing the relationship between subtelomeric duplication and ITS, we identify multiple ITS which were the ancestral chromosome telomeric capping sequence. Comparison of chromosome terminal sequence between 15 species reveals an ongoing evolutionary process of chromosome extension, with an average extension rate of 0.0020 bp per year per chromosome. Analysis of SNP data from 1000 genomes demonstrates reduced SNP ersity in subtelomeric regions, indicating that many terminal regions are younger than the remaining autosomal sequence.
Publisher: Oxford University Press (OUP)
Date: 16-09-2014
DOI: 10.1093/NAR/GKU849
Publisher: Cold Spring Harbor Laboratory
Date: 31-07-2023
DOI: 10.1101/2023.07.28.23293197
Abstract: Multisystem inflammatory syndrome in children (MIS-C) is a rare but serious hyperinflammatory complication following infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The mechanisms underpinning the pathophysiology of MIS-C are poorly understood. Moreover, clinically distinguishing MIS-C from other childhood infectious and inflammatory conditions, such as Kawasaki Disease (KD) or severe bacterial and viral infections is challenging due to overlapping clinical and laboratory features. We aimed to determine a set of plasma protein biomarkers that could discriminate MIS-C from those other diseases. Seven candidate protein biomarkers for MIS-C were selected based on literature and from whole blood RNA-Sequencing data from patients with MIS-C and other diseases. Plasma concentrations of ARG1, CCL20, CD163, CORIN, CXCL9, PCSK9 and ADAMTS2 were quantified in MIS-C (n=22), KD (n=23), definite bacterial (DB n=28) and viral (DV, n=27) disease, and healthy controls (n=8). Logistic regression models were used to determine the discriminatory ability of in idual proteins and protein combinations to identify MIS-C, and association with severity of illness. Plasma levels of CD163, CXCL9, and PCSK9 were significantly elevated in MIS-C with a combined AUC of 86% (95% CI: 76.8%-95.1%) for discriminating MIS-C from other childhood diseases. Lower ARG1 and CORIN plasma levels were significantly associated with severe MIS-C cases requiring oxygen, inotropes or with shock. Our findings demonstrate the feasibility of a host protein biomarker signature for MIS-C and may provide new insight into its pathophysiology.
Publisher: Oxford University Press (OUP)
Date: 08-06-2011
DOI: 10.1093/HMG/DDR248
Abstract: Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting ∼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of ∼5000 and ∼2000 subjects, respectively. We examined 700 pathways comprising ∼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked ∼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2010
DOI: 10.1109/TSC.2010.34
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 12-2016
DOI: 10.1161/CIRCGENETICS.116.001533
Abstract: Kawasaki disease (KD) is an acute pediatric vasculitis in which host genetics influence both susceptibility to KD and the formation of coronary artery aneurysms. Variants discovered by genome-wide association studies and linkage studies only partially explain the influence of genetics on KD susceptibility. To search for additional functional genetic variation, we performed pathway and gene stability analysis on a genome-wide association study data set. Pathway analysis using European genome-wide association study data identified 100 significantly associated pathways ( P ×10 − 4 ). Gene stability selection identified 116 single nucleotide polymorphisms in 26 genes that were responsible for driving the pathway associations, and gene ontology analysis demonstrated enrichment for calcium transport ( P =1.05×10 − 4 ). Three single nucleotide polymorphisms in solute carrier family 8, member 1 ( SLC8A1 ), a sodium/calcium exchanger encoding NCX1, were validated in an independent Japanese genome-wide association study data set (meta-analysis P =0.0001). Patients homozygous for the A (risk) allele of rs13017968 had higher rates of coronary artery abnormalities ( P =0.029). NCX1, the protein encoded by SLC8A1 , was expressed in spindle-shaped and inflammatory cells in the aneurysm wall. Increased intracellular calcium mobilization was observed in B cell lines from healthy controls carrying the risk allele. Pathway-based association analysis followed by gene stability selection proved to be a valuable tool for identifying risk alleles in a rare disease with complex genetics. The role of SLC8A1 polymorphisms in altering calcium flux in cells that mediate coronary artery damage in KD suggests that this pathway may be a therapeutic target and supports the study of calcineurin inhibitors in acute KD.
Publisher: Springer Science and Business Media LLC
Date: 10-08-2013
Publisher: Cold Spring Harbor Laboratory
Date: 04-08-2017
DOI: 10.1101/172601
Abstract: An outbreak of multi-drug resistant tuberculosis has been reported on Daru Island, Papua New Guinea. The Mycobacterium tuberculosis strains driving this outbreak and the temporal accrual of drug resistance mutations have not been described. We analyzed 100 isolates using whole genome sequencing and found 95 belonged to a single modern Beijing strain cluster. Molecular dating suggested acquisition of streptomycin and isoniazid resistance in the 1960s, with virulence potentially enhanced by a mycP1 mutation. The outbreak cluster demonstrated a high degree of co-resistance between isoniazid and ethionamide (80/95 84.2%) attributed to an inhA promoter mutation combined with inhA and ndh coding mutations. Multidrug resistance (MDR), observed in 78/95 s les, emerged with the acquisition of a typical rpoB mutation together with a compensatory rpoC mutation in the 1980s. There was independent acquisition of fluoroquinolone and aminoglycoside resistance with evidence of local transmission of extensively-drug resistant (XDR) strains from 2009. These findings underscore the importance of whole-genome sequencing in informing an effective public health response to MDR/XDR M. tuberculosis.
Publisher: Public Library of Science (PLoS)
Date: 26-02-2010
Publisher: Oxford University Press (OUP)
Date: 04-08-2004
DOI: 10.1093/BIOINFORMATICS/BTH942
Abstract: Motivation: Pseudogenes are the remnants of genomic sequences of genes which are no longer functional. They are frequent in most eukaryotic genomes, and an important resource for comparative genomics. However, pseudogenes are often mis-annotated as functional genes in sequence databases. Current methods for identifying pseudogenes include methods which rely on the presence of stop codons and frameshifts, as well as methods based on the ratio of non-silent to silent nucleotide substitution rates (dN/dS). A recent survey concluded that 50% of human pseudogenes have no detectable truncation in their pseudo-coding regions, indicating that the former methods lack sensitivity. The latter methods have been used to find sets of genes enriched for pseudogenes, but are not specific enough to accurately separate pseudogenes from expressed genes. Results: We introduce a program called pseudogene inference from loss of constraint (PSILC) which incorporates novel methods for separating pseudogenes from functional genes. The methods calculate the log-odds score that evolution along the final branch of the gene tree to the query gene has been according to the following constraints: A neutral nucleotide model compared to a Pfam domain encoding model (PSILCnuc/dom) A protein coding model compared to a Pfam domain encoding model (PSILCprot/dom). Using the manual annotation of human chromosome 6, we show that both these methods result in a more accurate classification of pseudogenes than dN/dS when a Pfam domain alignment is available. Availability: PSILC is available from www.sanger.ac.uk/Software/PSILC
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Elsevier BV
Date: 10-2021
Publisher: Springer New York
Date: 2014
Publisher: Cold Spring Harbor Laboratory
Date: 15-05-2018
DOI: 10.1101/321463
Abstract: Improved methods are needed to identify host mechanisms which directly protect against human infectious diseases in order to develop better vaccines and therapeutics 1,2 . Pathogen load determines the outcome of many infections 3 , and is a consequence of pathogen multiplication rate, duration of the infection, and inhibition or killing of pathogen by the host (resistance). If these determinants of pathogen load could be quantified then their mechanistic correlates might be determined. In humans the timing of infection is rarely known and treatment cannot usually be withheld to monitor serial changes in pathogen load and host response. Here we present an approach to overcome this and identify potential mechanisms of resistance which control parasite load in Plasmodium falciparum malaria. Using a mathematical model of longitudinal infection dynamics for orientation, we made in idualized estimates of parasite multiplication and growth inhibition in Gambian children at presentation with acute malaria and used whole blood RNA-sequencing to identify their correlates. We identified novel roles for secreted proteases cathepsin G and matrix metallopeptidase 9 (MMP9) as direct effector molecules which inhibit P. falciparum growth. Cathepsin G acts on the erythrocyte membrane, cleaving surface receptors required for parasite invasion, whilst MMP9 acts on the parasite. In contrast, the type 1 interferon response and expression of CXCL10 (IFN-γ-inducible protein of 10 kDa, IP-10) were detrimental to control of parasite growth. Natural variation in iron status and plasma levels of complement factor H were determinants of parasite multiplication rate. Our findings demonstrate the importance of accounting for the dynamic interaction between host and pathogen when seeking to identify correlates of protection, and reveal novel mechanisms controlling parasite growth in humans. This approach could be extended to identify additional mechanistic correlates of natural- and vaccine-induced immunity to malaria and other infections.
Publisher: Springer Science and Business Media LLC
Date: 20-01-2022
DOI: 10.1186/S12885-021-09160-1
Abstract: Circulating cell-free DNA (cfDNA) in the plasma of cancer patients contains cell-free tumour DNA (ctDNA) derived from tumour cells and it has been widely recognized as a non-invasive source of tumour DNA for diagnosis and prognosis of cancer. Molecular profiling of ctDNA is often performed using targeted sequencing or low-coverage whole genome sequencing (WGS) to identify tumour specific somatic mutations or somatic copy number aberrations (sCNAs). However, these approaches cannot efficiently detect all tumour-derived genomic changes in ctDNA. We performed WGS analysis of cfDNA from 4 breast cancer patients and 2 patients with benign tumours. We sequenced matched germline DNA for all 6 patients and tumour s les from the breast cancer patients. All s les were sequenced on Illumina HiSeqXTen sequencing platform and achieved approximately 30x, 60x and 100x coverage on germline, tumour and plasma DNA s les, respectively. The mutational burden of the plasma s les (1.44 somatic mutations/Mb of genome) was higher than the matched tumour s les. However, 90% of high confidence somatic cfDNA variants were not detected in matched tumour s les and were found to comprise two background plasma mutational signatures. In contrast, cfDNA from the di-nucleosome fraction (300 bp–350 bp) had much higher proportion (30%) of variants shared with tumour. Despite high coverage sequencing we were unable to detect sCNAs in plasma s les. Deep sequencing analysis of plasma s les revealed higher fraction of unique somatic mutations in plasma s les, which were not detected in matched tumour s les. Sequencing of di-nucleosome bound cfDNA fragments may increase recovery of tumour mutations from plasma.
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Cold Spring Harbor Laboratory
Date: 05-2021
DOI: 10.1101/2021.04.30.442218
Abstract: Nanopore sequencing and phylodynamic modelling have been used to reconstruct the transmission dynamics of viral epidemics, but their application to bacterial pathogens has remained challenging. Here, we implement Random Forest models for single nucleotide polymorphism (SNP) polishing to estimate ergence and effective reproduction numbers (R e ) of two community-associated, methicillin-resistant Staphylococcus aureus (MRSA) outbreaks in remote Far North Queensland and Papua New Guinea (n = 159). Successive bar-coded panels of S. aureus isolates (2 × 12 per MinION) sequenced at low-coverage ( 5x - 10x) provided sufficient data to accurately infer assembly genotypes with high recall when compared with Illumina references. De novo SNP calling with Clair was followed by SNP polishing using intra- and inter-species models trained on Snippy reference calls. Models achieved sufficient resolution on ST93 outbreak sequence types ( 70 - 90% accuracy and precision) for phylodynamic modelling from lineage-wide hybrid alignments and birth-death skyline models in BEAST2 . Our method reproduced phylogenetic topology, geographical source of the outbreaks, and indications of sustained transmission (R e 1). We provide Nextflow pipelines that implement SNP polisher training, evaluation, and outbreak alignments, enabling reconstruction of within-lineage transmission dynamics for infection control of bacterial disease outbreaks using nanopore sequencing.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2022
Publisher: Springer Science and Business Media LLC
Date: 09-11-2018
DOI: 10.1038/S41598-018-34774-0
Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. At least 11 BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence and extensions. These extensions are heritable and in some cases ergent between Asian and European s les. Six out of nine predicted extension sequences from NA12878 can be confirmed and filled by nanopore data. We identify two multi-kilobase sequence families both enriched more than 100-fold in extension sequence (p-values 1e-5) whose origins can be traced to interstitial sequence on ancestral primate chromosome 7. Extensive sub-telomeric duplication of these families has occurred in the human lineage subsequent to ergence from chimpanzees.
Publisher: Royal Society of Chemistry (RSC)
Date: 2019
DOI: 10.1039/C9LC00978G
Abstract: High throughput screening of phage display libraries for target binding molecules using electrohydrodynamic nanomixing and nanopore sequencing.
Publisher: Springer Science and Business Media LLC
Date: 2011
Publisher: Springer Science and Business Media LLC
Date: 30-05-2010
DOI: 10.1038/NMETH.1466
Abstract: Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple s les and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French in iduals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more lified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.
Publisher: IEEE
Date: 06-2014
DOI: 10.1109/ICWS.2014.44
Publisher: Springer International Publishing
Date: 2018
Publisher: Association for Computing Machinery (ACM)
Date: 14-07-2021
DOI: 10.1145/3452332
Abstract: We propose a novel Infrastructure-as-a-Service composition framework that selects an optimal set of consumer requests according to the provider’s qualitative preferences on long-term service provisions. Decision variables are included in the temporal conditional preference networks to represent qualitative preferences for both short-term and long-term consumers. The global preference ranking of a set of requests is computed using a k -d tree indexing-based temporal similarity measure approach. We propose an extended three-dimensional Q-learning approach to maximize the global preference ranking. We design the on-policy-based sequential selection learning approach that applies the length of request to accept or reject requests in a composition. The proposed on-policy-based learning method reuses historical experiences or policies of sequential optimization using an agglomerative clustering approach. Experimental results prove the feasibility of the proposed framework.
Publisher: American Society for Microbiology
Date: 06-2018
Abstract: Transcriptomics, the analysis of genome-wide RNA expression, is a common approach to investigate host and pathogen processes in infectious diseases. Technical and bioinformatic advances have permitted increasingly thorough analyses of the association of RNA expression with fundamental biology, immunity, pathogenesis, diagnosis, and prognosis. Transcriptomic approaches can now be used to realize a previously unattainable goal, the simultaneous study of RNA expression in host and pathogen, in order to better understand their interactions. This exciting prospect is not without challenges, especially as focus moves from interactions in vitro under tightly controlled conditions to tissue- and systems-level interactions in animal models and natural and experimental infections in humans. Here we review the contribution of transcriptomic studies to the understanding of malaria, a parasitic disease which has exerted a major influence on human evolution and continues to cause a huge global burden of disease. We consider malaria a paradigm for the transcriptomic assessment of systemic host-pathogen interactions in humans, because much of the direct host-pathogen interaction occurs within the blood, a readily s led compartment of the body. We illustrate lessons learned from transcriptomic studies of malaria and how these lessons may guide studies of host-pathogen interactions in other infectious diseases. We propose that the potential of transcriptomic studies to improve the understanding of malaria as a disease remains partly untapped because of limitations in study design rather than as a consequence of technological constraints. Further advances will require the integration of transcriptomic data with analytical approaches from other scientific disciplines, including epidemiology and mathematical modeling.
Publisher: Springer International Publishing
Date: 2018
Publisher: Springer Science and Business Media LLC
Date: 18-01-2009
DOI: 10.1038/NG.301
Abstract: We analyzed genome-wide association data from 1,380 Europeans with early-onset and morbid adult obesity and 1,416 age-matched normal-weight controls. Thirty-eight markers showing strong association were further evaluated in 14,186 European subjects. In addition to FTO and MC4R, we detected significant association of obesity with three new risk loci in NPC1 (endosomal/lysosomal Niemann-Pick C1 gene, P = 2.9 x 10(-7)), near MAF (encoding the transcription factor c-MAF, P = 3.8 x 10(-13)) and near PTER (phosphotriesterase-related gene, P = 2.1 x 10(-7)).
Publisher: Springer Science and Business Media LLC
Date: 16-07-2018
Publisher: Massachusetts Medical Society
Date: 05-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2022
Publisher: Centers for Disease Control and Prevention (CDC)
Date: 03-2019
Publisher: Springer Science and Business Media LLC
Date: 2004
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2009
Publisher: Microbiology Society
Date: 04-05-2021
Abstract: Tuberculosis is a leading public health priority in eastern Malaysia. Knowledge of the genomic epidemiology of tuberculosis can help tailor public health interventions. Our aims were to determine tuberculosis genomic epidemiology and characterize resistance mutations in the ethnically erse city of Kota Kinabalu, Sabah, located at the nexus of Malaysia, Indonesia, Philippines and Brunei. We used an archive of prospectively collected Mycobacterium tuberculosis s les paired with epidemiological data. We collected sputum and demographic data from consecutive consenting outpatients with pulmonary tuberculosis at the largest tuberculosis clinic from 2012 to 2014, and selected s les from tuberculosis inpatients from the tertiary referral centre during 2012–2014 and 2016–2017. Two hundred and eight M . tuberculosis sequences were available for analysis, representing 8 % of cases notified during the study periods. Whole-genome phylogenetic analysis demonstrated that most strains were lineage 1 (195/208, 93.8 %), with the remainder being lineages 2 (8/208, 3.8 %) or 4 (5/208, 2.4 %). Lineages or sub-lineages were not associated with patient ethnicity. The lineage 1 strains were erse, with sub-lineage 1.2.1 being dominant (192, 98 %). Lineage 1.2.1.3 isolates were geographically most widely distributed. The greatest ersity occurred in a border town sub-district. The time to the most recent common ancestor for the three major lineage 1.2.1 clades was estimated to be the year 1966 (95 % HPD 1948–1976). An association was found between failure of culture conversion by week 8 of treatment and infection with lineage 2 (4/6, 67 %) compared with lineage 1 strains (4/83, 5 %) ( P .001), supporting evidence of greater virulence of lineage 2 strains. Eleven potential transmission clusters (SNP difference ≤12) were identified at least five included people living in different sub-districts. Some linked cases spanned the whole 4-year study period. One cluster involved a multidrug-resistant tuberculosis strain matching a drug-susceptible strain from 3 years earlier. Drug resistance mutations were uncommon, but revealed one phenotype–genotype mismatch in a genotypically multidrug-resistant isolate, and rare nonsense mutations within the katG gene in two isolates. Consistent with the regionally mobile population, M. tuberculosis strains in Kota Kinabalu were erse, although several lineage 1 strains dominated and were locally well established. Transmission clusters – uncommonly identified, likely attributable to incomplete s ling – showed clustering occurring across the community, not confined to households or sub-districts. The findings indicate that public health priorities should include active case finding and early institution of tuberculosis management in mobile populations, while there is a need to upscale effective contact investigation beyond households to include other contacts within social networks.
Publisher: Springer Science and Business Media LLC
Date: 26-05-2017
DOI: 10.1038/IJO.2017.126
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2022
Publisher: Association for Computing Machinery (ACM)
Date: 24-03-2017
DOI: 10.1145/2983528
Abstract: Mapping out the challenges and strategies for the widespread adoption of service computing.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 27-06-2018
DOI: 10.1126/SCITRANSLMED.AAR3619
Abstract: Host and parasite RNA sequencing is combined with parasite load estimates to reveal mechanisms associated with human severe malaria.
Publisher: Oxford University Press (OUP)
Date: 02-2012
DOI: 10.1534/GENETICS.111.135657
Abstract: Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer in iduals’ birthplace from genetic data. Such inference has been demonstrated for in iduals’ country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses in iduals’ PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 in iduals to a median distance of 47 km. We also apply pcLOCATE to Indian in iduals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2022
Publisher: ACM
Date: 09-05-2023
Publisher: Public Library of Science (PLoS)
Date: 22-10-2013
Publisher: Cold Spring Harbor Laboratory
Date: 18-02-2018
DOI: 10.1101/267302
Abstract: Antibiotic resistance poses a major threat to public health. Overuse and misuse of antibiotics are generally recognised as the key factors contributing to antibiotic resistance. However, whether non-antibiotic, anti-microbial (NAAM) chemicals can directly induce antibiotic resistance is unclear. We aim to investigate whether the exposure to a NAAM chemical triclosan (TCS) has an impact on inducing antibiotic resistance on Escherichia coli . Here, we report that at a concentration of 0.2 mg/L TCS induces multi-drug resistance in wild-type Escherichia coli after 30-day TCS exposure. The oxidative stress induced by TCS caused genetic mutations in genes such as fabI , frdD , marR , acrR and soxR , and subsequent up-regulation of the transcription of genes encoding beta-lactamase and multi-drug efflux pump, together with down-regulation of genes related to membrane permeability. The findings advance our understanding of the potential role of NAAM chemicals in the dissemination of antibiotic resistance in microbes, and highlights the need for controlling biocide applications.
Publisher: IEEE
Date: 06-2014
DOI: 10.1109/ICWS.2014.24
Publisher: Springer Science and Business Media LLC
Date: 06-04-2010
DOI: 10.1038/NG.567
Publisher: Elsevier BV
Date: 08-2021
Publisher: Cold Spring Harbor Laboratory
Date: 04-02-2016
DOI: 10.1101/038828
Abstract: Accurate identification of copy number alterations is an essential step in understanding the events driving tumor progression. While a variety of algorithms have been developed to use high-throughput sequencing data to profile copy number changes, no tool is able to reliably characterize ploidy and genotype absolute copy number from tumor s les which contain less than 40% tumor cells. To increase our power to resolve the copy number profile from low-cellularity tumor s les, we developed a novel approach which pre-phases heterozygote germline SNPs in order to replace the commonly used ‘B-allele frequency’ with a more powerful ‘parental-haplotype frequency’. We apply our tool - sCNAphase - to characterize the copy number and loss-of-heterozygosity profiles of four publicly available breast cancer cell-lines. Comparisons to previous spectral karyotyping and microarray studies revealed that sCNAphase reliably identified overall ploidy as well as the in idual copy number mutations from each cell-line. Analysis of artificial cell-line mixtures demonstrated the capacity of this method to determine the level of tumor cellularity, consistently identify sCNAs and characterize ploidy in s les with as little as 10% tumor cells. This novel methodology has the potential to bring sCNA profiling to low-cellularity tumors, a form of cancer unable to be accurately studied by current methods.
Publisher: Springer Berlin Heidelberg
Date: 2015
Publisher: Public Library of Science (PLoS)
Date: 08-10-2013
Publisher: Cold Spring Harbor Laboratory
Date: 17-06-2019
DOI: 10.1101/673251
Abstract: Tandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between in iduals. However, population variation of TRs have not been widely explored due to the limitations of existing tools, which are either low-throughput or restricted to a small subset of TRs. Here, we used SureSelect targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X sequence coverage on 7 s les utilizing 2 MinION flow-cells with 200ng of input DNA per s le. We identified a subset of 110 TR loci with length less than 2kb, and GC content greater than 25% for which we achieved an average genotyping rate of 75% and increasing to 91% for the highest-coverage s le. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and moreover highly correlated with alleles estimated from whole genome long-read sequencing. We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more s les facilitating large-scale analysis of TRs.
Publisher: Springer Berlin Heidelberg
Date: 2015
Publisher: Oxford University Press (OUP)
Date: 28-11-2017
DOI: 10.1093/NAR/GKW1086
Publisher: Public Library of Science (PLoS)
Date: 26-06-2009
Publisher: Springer New York
Date: 06-08-2013
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Science and Business Media LLC
Date: 12-2008
Abstract: The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21 Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each in idual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a s ling algorithm to infer haplotypes jointly in multiple in iduals and to obtain a measure of uncertainty in its inferences. In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Publisher: Oxford University Press (OUP)
Date: 10-11-2015
DOI: 10.1093/BIOINFORMATICS/BTV658
Abstract: Motivation: The recently released Oxford Nanopore MinION sequencing platform presents many innovative features opening up potential for a range of applications not previously possible. Among these features, the ability to sequence in real-time provides a unique opportunity for many time-critical applications. While many software packages have been developed to analyze its data, there is still a lack of toolkits that support the streaming and real-time analysis of MinION sequencing data. Results: We developed npReader, an open-source software package to facilitate real-time analysis of MinION sequencing data. npReader can simultaneously extract sequence reads and stream them to downstream analysis pipelines while the s les are being sequenced on the MinION device. It provides a command line interface for easy integration into a bioinformatics work flow, as well as a graphical user interface which concurrently displays the statistics of the run. It also provides an application programming interface for development of streaming algorithms in order to fully utilize the extent of nanopore sequencing potential. Availability and implementation: npReader is written in Java and is freely available at dcao/npReader. Contact: m.cao1@uq.edu.au or l.coin@imb.uq.edu.au
Publisher: Springer Science and Business Media LLC
Date: 12-01-2018
DOI: 10.1038/S41598-017-18528-Y
Abstract: Mycobacterium tuberculosis ( M. tuberculosis ) survives and multiplies inside human macrophages by subversion of immune mechanisms. Although these immune evasion strategies are well characterised functionally, the underlying molecular mechanisms are poorly understood. Here we show that during infection of human whole blood with M. tuberculosis , host gene transcriptional suppression, rather than activation, is the predominant response. Spatial, temporal and functional characterisation of repressed genes revealed their involvement in pathogen sensing and phagocytosis, degradation within the phagolysosome and antigen processing and presentation. To identify mechanisms underlying suppression of multiple immune genes we undertook epigenetic analyses. We identified significantly differentially expressed microRNAs with known targets in suppressed genes. In addition, after searching regions upstream of the start of transcription of suppressed genes for common sequence motifs, we discovered novel enriched composite sequence patterns, which corresponded to Alu repeat elements, transposable elements known to have wide ranging influences on gene expression. Our findings suggest that to survive within infected cells, mycobacteria exploit a complex immune “molecular off switch” controlled by both microRNAs and Alu regulatory elements.
Publisher: Cold Spring Harbor Laboratory
Date: 18-08-2017
DOI: 10.1101/178103
Abstract: Detection of genomic inversions remains challenging. Many existing methods primarily target inversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm presence of two of these novel NAHR inversions. We show that there is a near linear relationship between the length of flanking IR and the size of the NAHR inversion.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2016
Publisher: American Association for the Advancement of Science (AAAS)
Date: 06-09-2019
Abstract: Longitudinal data find a new variant controlling BMI in infancy and reveal genetic differences between infant and adult BMI.
Publisher: Elsevier BV
Date: 05-2021
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
Start Date: 2017
End Date: 2019
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 2015
Funder: Australian Research Council
View Funded ActivityStart Date: 2017
End Date: 2021
Funder: National Health and Medical Research Council
View Funded ActivityStart Date: 11-2012
End Date: 03-2017
Amount: $617,528.00
Funder: Australian Research Council
View Funded ActivityStart Date: 05-2019
End Date: 12-2023
Amount: $466,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 01-2016
Amount: $540,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2017
End Date: 12-2019
Amount: $419,500.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2014
End Date: 06-2017
Amount: $90,000.00
Funder: Australian Research Council
View Funded Activity