ARDC Research Link Australia

Publication

A Trust Prediction Model for Service Web

Publisher: IEEE

Date: 11-2011

DOI: 10.1109/TRUSTCOM.2011.35

Publication

Evaluating the Genome and Resistome of Extensively Drug-Resistant Klebsiella pneumoniae using Native DNA and RNA Nanopore Sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 29-11-2018

DOI: 10.1101/482661

Abstract: Klebsiella pneumoniae frequently harbour multidrug resistance and current methodologies are struggling to rapidly discern feasible antibiotics to treat these infections. While rapid DNA sequencing has been proposed for prediction of resistance profile the role of rapid RNA sequencing has yet to be fully explored. The MinION sequencer can sequence native DNA and RNA in real-time, providing an opportunity to contrast the utility of DNA and RNA for prediction of drug susceptibility. This study interrogated the genome and transcriptome of four extensively drug-resistant (XDR) K. pneumoniae clinical isolates. The majority of acquired resistance (≥75%) resided on plasmids including several megaplasmids (≥100 kbp). DNA sequencing identified most resistance genes (≥70%) within 2 hours of sequencing. Direct RNA sequencing (with a ∼6x slower pore translocation) was able to identify ≥35% of resistance genes, including aminoglycoside, β-lactam, trimethoprim and sulphonamide and also quinolone, rif icin, fosfomycin and phenicol in some isolates, within 10 hours of sequencing. Polymyxin-resistant isolates showed a heightened transcription of phoPQ ( ≥2-fold) and the pmrHFIJKLM operon (≥8-fold). Expression levels estimated from direct RNA sequencing displayed strong correlation (Pearson: 0.86) compared to qRT-PCR across 11 resistance genes. Overall, MinION sequencing rapidly detected the XDR K. pneumoniae resistome and direct RNA sequencing revealed differential expression of these genes.

Publication

Multifactorial chromosomal variants regulate polymyxin resistance in extensively drug-resistant Klebsiella pneumoniae

Publisher: Microbiology Society

Date: 03-2018

DOI: 10.1099/MGEN.0.000158

Publication

An integrated map of genetic variation from 1,092 human genomes

Publisher: Springer Science and Business Media LLC

Date: 31-10-2012

DOI: 10.1038/NATURE11632

Publication

Rare Genomic Structural Variants in Complex Disease: Lessons from the Replication of Associations with Obesity

Publisher: Public Library of Science (PLoS)

Date: 12-03-2013

DOI: 10.1371/JOURNAL.PONE.0058048

Publication

A Trust Ontology for Semantic Services

Publisher: IEEE

Date: 07-2010

DOI: 10.1109/SCC.2010.42

Publication

Web Service management system for bioinformatics research: a case study

Publisher: Springer Science and Business Media LLC

Date: 11-02-2011

DOI: 10.1007/S11761-011-0076-9

Publication

Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION™ sequencing

Publisher: Oxford University Press (OUP)

Date: 26-07-2016

DOI: 10.1186/S13742-016-0137-2

Publication

Inferring combined CNV/SNP haplotypes from genotype data

Publisher: Oxford University Press (OUP)

Date: 20-04-2010

DOI: 10.1093/BIOINFORMATICS/BTQ157

Abstract: Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of in idual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A s ling algorithm is employed to obtain a measure of confidence/credibility of each estimate. Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: l.coin@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

Publisher: Public Library of Science (PLoS)

Date: 29-08-2008

DOI: 10.1371/JOURNAL.PONE.0003104

Publication

Real-time demultiplexing Nanopore barcoded sequencing data with npBarcode

Publisher: Cold Spring Harbor Laboratory

Date: 04-05-2017

DOI: 10.1101/134155

Abstract: The recently introduced barcoding protocol to Oxford Nanopore sequencing has increased the versatility of the technology. Several bioinformatic tools have been developed to demultiplex the barcoded reads, but none of them support the streaming analysis. This limits the use of pooled sequencing in real-time applications, which is one of the main advantages of the technology. We introduced npBarcode, an open source and cross platform tool for barcode demultiplex in streaming fashion. npBarcode can be seamlessly integrated into a streaming analysis pipeline. The tool also provides a friendly graphical user interface through npReader, allowing the real-time visual monitoring of the sequencing progress of barcoded s les. We show that npBarcode achieves comparable accuracies to the other alternatives. npBarcode is bundled in Japsa - a Java tools kit for genome analysis, and is freely available at snguyen/npBarcode .

Publication

Identification of Reduced Host Transcriptomic Signatures for Tuberculosis Disease and Digital PCR-Based Validation and Quantification

Publisher: Frontiers Media SA

Date: 02-03-2021

DOI: 10.3389/FIMMU.2021.637164

Abstract: Recently, host whole blood gene expression signatures have been identified for diagnosis of tuberculosis (TB). Absolute quantification of the concentrations of signature transcripts in blood have not been reported, but would facilitate diagnostic test development. To identify minimal transcript signatures, we applied a transcript selection procedure to microarray data from African adults comprising 536 patients with TB, other diseases (OD) and latent TB (LTBI), ided into training and test sets. Signatures were further investigated using reverse transcriptase (RT)—digital PCR (dPCR). A four-transcript signature ( GBP6, TMCC1, PRDM1 , and ARG1 ) measured using RT-dPCR distinguished TB patients from those with OD (area under the curve (AUC) 93.8% (CI 95% 82.2–100%). A three-transcript signature ( FCGR1A, ZNF296, and C1QB ) differentiated TB from LTBI (AUC 97.3%, CI 95% : 93.3–100%), regardless of HIV. These signatures have been validated across platforms and across s les offering strong, quantitative support for their use as diagnostic biomarkers for TB.

Publication

Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms

Publisher: Springer Science and Business Media LLC

Date: 2012

DOI: 10.1186/GB-2012-13-6-R46

Publication

Improving the humification and phosphorus flow during swine manure composting: A trial for enhancing the beneficial applications of hazardous biowastes

Publisher: Elsevier BV

Date: 03-2022

DOI: 10.1016/J.JHAZMAT.2021.127906

Abstract: Improving the recovery of organic matter and phosphorus (P) from hazardous biowastes such as swine manure using acidic substrates (ASs) in conjunction with aerobic composting is of great interest. This work aimed to investigate the effects of ASs on the humification and/or P migration as well as on microbial succession during the swine manure composting, employing multivariate and multiscale approaches. Adding ASs, derived from wood vinegar and humic acid, increased the degree of humification and thermal stability of the compost. The

Publication

Genotype-free demultiplexing of pooled single-cell RNA-seq

Publisher: Cold Spring Harbor Laboratory

Date: 07-03-2019

DOI: 10.1101/570614

Abstract: A variety of experimental and computational methods have been developed to demultiplex s les from pooled in iduals in a single-cell RNA sequencing (scRNA-Seq) experiment which either require adding information (such as hashtag barcodes) or measuring information (such as genotypes) prior to pooling. We introduce scSplit which utilises genetic differences inferred from scRNA-Seq data alone to demultiplex pooled s les. scSplit also extracts a minimal set of high confidence presence/absence genotypes in each cluster which can be used to map clusters to original s les. Using a range of simulated, merged in idual-s le as well as pooled multi-in idual scRNA-Seq datasets, we show that scSplit is highly accurate and concordant with demuxlet predictions. Furthermore, scSplit predictions are highly consistent with the known truth in cell-hashing dataset. We also show that multiplexed-scRNA-Seq can be used to reduce batch effects caused by technical biases. scSplit is ideally suited to s les for which external genome-wide genotype data cannot be obtained (for ex le non-model organisms), or for which it is impossible to obtain unmixed s les directly, such as mixtures of genetically distinct tumour cells, or mixed infections. scSplit is available at: on-xu/scSplit

Publication

Transcriptional and epi-transcriptional dynamics of SARS-CoV-2 during cellular infection

Publisher: Cold Spring Harbor Laboratory

Date: 22-12-2020

DOI: 10.1101/2020.12.22.423893

Abstract: SARS-CoV-2 uses subgenomic (sg)RNA to produce viral proteins for replication and immune evasion. We applied long-read RNA and cDNA sequencing to in vitro human and primate infection models to study transcriptional dynamics. Transcription-regulating sequence (TRS)-dependent sgRNA was upregulated earlier in infection than TRS-independent sgRNA. An abundant class of TRS-independent sgRNA consisting of a portion of ORF1ab containing nsp1 joined to ORF10 and 3’UTR was upregulated at 48 hours post infection in human cell lines. We identified double-junction sgRNA containing both TRS-dependent and independent junctions. We found multiple sites at which the SARS-CoV-2 genome is consistently more modified than sgRNA, and that sgRNA modifications are stable across transcript clusters, host cells and time since infection. Our work highlights the dynamic nature of the SARS-CoV-2 transcriptome during its replication cycle. Our results are available via an interactive web-app at coinlab.mdhs.unimelb.edu.au/ .

Publication

Molecular Methods for Pathogenic Bacteria Detection and Recent Advances in Wastewater Analysis

Publisher: MDPI AG

Date: 12-12-2021

DOI: 10.3390/W13243551

Abstract: With increasing concerns about public health and the development of molecular techniques, new detection tools and the combination of existing approaches have increased the abilities of pathogenic bacteria monitoring by exploring new biomarkers, increasing the sensitivity and accuracy of detection, quantification, and analyzing various genes such as functional genes and antimicrobial resistance genes (ARG). Molecular methods are gradually emerging as the most popular detection approach for pathogens, in addition to the conventional culture-based plate enumeration methods. The analysis of pathogens in wastewater and the back-estimation of infections in the community, also known as wastewater-based epidemiology (WBE), is an emerging methodology and has a great potential to supplement current surveillance systems for the monitoring of infectious diseases and the early warning of outbreaks. However, as a complex matrix, wastewater largely challenges the analytical performance of molecular methods. This review synthesized the literature of typical pathogenic bacteria in wastewater, types of biomarkers, molecular methods for bacterial analysis, and their recent advances in wastewater analysis. The advantages and limitation of these molecular methods were evaluated, and their prospects in WBE were discussed to provide insight for future development.

Publication

invertFREGENE: software for simulating inversions in population genetic data

Publisher: Oxford University Press (OUP)

Date: 26-01-2010

DOI: 10.1093/BIOINFORMATICS/BTQ029

Abstract: Summary: Inversions are a common form of structural variation, which may have a marked effect on the genome and methods to infer quantities of interest such as those relating to population structure and natural selection. However, due to the challenge in detecting inversions, little is presently known about their impact. Software to simulate inversions could be used to provide a better understanding of how to detect and account for them but while there are several software packages for simulating population genetic data, none incorporate inversion polymorphisms. Here, we describe a software package, modified from the forward-in-time simulator FREGENE, which simulates the evolution of an inversion polymorphism, of specified length, location, frequency and age, in a population of sequences. We describe previously unreported signatures of inversions in SNP data observed in invertFREGENE results and a known inversion in humans. Availability: C++ source code and user manual are available for download from www.ebi.ac.uk rojects/BARGEN/ under the GPL licence. Contact: l.coin@ic.ac.uk c.hoggart@ic.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Detection of Streptococcus pyogenes M1UK in Australia and characterization of the mutation driving enhanced expression of superantigen SpeA

Publisher: Springer Science and Business Media LLC

Date: 24-02-2023

DOI: 10.1038/S41467-023-36717-4

Abstract: A new variant of Streptococcus pyogenes serotype M1 (designated ‘M1 UK ’) has been reported in the United Kingdom, linked with seasonal scarlet fever surges, marked increase in invasive infections, and exhibiting enhanced expression of the superantigen SpeA. The progenitor S. pyogenes ‘M1 global ’ and M1 UK clones can be differentiated by 27 SNPs and 4 indels, yet the mechanism for speA upregulation is unknown. Here we investigate the previously unappreciated expansion of M1 UK in Australia, now isolated from the majority of serious infections caused by serotype M1 S. pyogenes . M1 UK sub-lineages circulating in Australia also contain a novel toxin repertoire associated with epidemic scarlet fever causing S. pyogenes in Asia. A single SNP in the 5’ transcriptional leader sequence of the transfer-messenger RNA gene ssrA drives enhanced SpeA superantigen expression as a result of ssrA terminator read-through in the M1 UK lineage. This represents a previously unappreciated mechanism of toxin expression and urges enhanced international surveillance.

Publication

Qualitative economic model for long-term IaaS composition

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-46295-0_20

Publication

Meta-path based service recommendation in heterogeneous information networks

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-46295-0_23

Publication

Enabling Privacy Preserving Mobile Advertising via Private Information Retrieval

Publisher: IEEE

Date: 10-2017

DOI: 10.1109/LCN.2017.63

Publication

Comparison of long-read methods for sequencing and assembly of a plant genome

Publisher: Oxford University Press (OUP)

Date: 12-2020

DOI: 10.1093/GIGASCIENCE/GIAA146

Abstract: Sequencing technologies have advanced to the point where it is possible to generate high-accuracy, haplotype-resolved, chromosome-scale assemblies. Several long-read sequencing technologies are available, and a growing number of algorithms have been developed to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology, as well as the most appropriate software for assembly and polishing. It is thus important to benchmark different approaches applied to the same s le. Here, we report a comparison of 3 long-read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii. We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION), and BGI (single-tube Long Fragment Read) technologies for the same s le. Several assemblers were benchmarked in the assembly of Pacific Biosciences and Nanopore reads. Results obtained from combining long-read technologies or short-read and long-read technologies are also presented. The assemblies were compared for contiguity, base accuracy, and completeness, as well as sequencing costs and DNA material requirements. The 3 long-read technologies produced highly contiguous and complete genome assemblies of M. jansenii. At the time of sequencing, the cost associated with each method was significantly different, but continuous improvements in technologies have resulted in greater accuracy, increased throughput, and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.

Publication

Digerati – A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins

Publisher: Elsevier BV

Date: 09-2023

DOI: 10.1016/J.COMPBIOMED.2023.107155

Publication

Profiling copy number alterations in cell-free tumour DNA using a single-reference

Publisher: Cold Spring Harbor Laboratory

Date: 28-03-2018

DOI: 10.1101/290171

Abstract: The accurate detection of copy number alterations from the analysis of circulating cell free tumour DNA (ctDNA) in blood is essential to realising the potential of liquid biopsies. However, currently available approaches require a large number of plasma s les from healthy in iduals, sequenced using the same platform and protocols to act as a reference panel. Obtaining this reference panel can be challenging, prohibitively expensive and limits the ability to migrate to improved sequencing platforms and improved protocols. We developed qCNV and sCNA-seq, two distinct tools that together provide a new approach for profiling somatic copy number alterations (sCNA) through the analysis of cell free DNA (cfDNA) without a reference panel. Our approach was designed to identify sCNA from cfDNA through the analysis of a single plasma s le and a matched normal DNA s le -both of which can be obtained from the same blood draw. qCNV is an efficient method for extracting read-depth from BAM files and sCNA-seq is a method that uses a probabilistic model of read depth to infer the copy number segmentation of the tumour. We compared the results from our pipeline to the established copy number profile of a cell-line, as well as the results from the plasma-Seq analysis of cfDNA-like mixtures and real, clinical data-sets. With a single, unmatched, germline reference s le, our pipeline recapitulated the known copy number profile of a cell-line and demonstrated similar results to those obtained from plasma-Seq. With less than 1X genome coverage, our approach identified clinically relevant sCNA in s les with as little as 20 % tumour DNA. When applied to plasma s les from cancer patients, our pipeline identified clinically significant mutations. These results show it is possible to identify therapeutically-relevant copy number mutations from plasma s les without the need to generate a reference panel from a large number of healthy in iduals. Together with the range of sequencing platforms supported by our qCNV+sCNA-Seq pipeline, as well as the Galaxy implementation of this solution, this pipeline makes cfDNA profiling more accessible and makes it easier to identify sCNA from the plasma of cancer patients.

Publication

Childhood tuberculosis is associated with decreased abundance of T cell gene transcripts and impaired T cell function

Publisher: Public Library of Science (PLoS)

Date: 15-11-2017

DOI: 10.1371/JOURNAL.PONE.0185973

Publication

Identification of regulatory variants associated with genetic susceptibility to meningococcal disease

Publisher: Springer Science and Business Media LLC

Date: 06-05-2019

DOI: 10.1038/S41598-019-43292-6

Abstract: Non-coding genetic variants play an important role in driving susceptibility to complex diseases but their characterization remains challenging. Here, we employed a novel approach to interrogate the genetic risk of such polymorphisms in a more systematic way by targeting specific regulatory regions relevant for the phenotype studied. We applied this method to meningococcal disease susceptibility, using the DNA binding pattern of RELA – a NF-kB subunit, master regulator of the response to infection – under bacterial stimuli in nasopharyngeal epithelial cells. We designed a custom panel to cover these RELA binding sites and used it for targeted sequencing in cases and controls. Variant calling and association analysis were performed followed by validation of candidate polymorphisms by genotyping in three independent cohorts. We identified two new polymorphisms, rs4823231 and rs11913168, showing signs of association with meningococcal disease susceptibility. In addition, using our genomic data as well as publicly available resources, we found evidences for these SNPs to have potential regulatory effects on ATXN10 and LIF genes respectively. The variants and related candidate genes are relevant for infectious diseases and may have important contribution for meningococcal disease pathology. Finally, we described a novel genetic association approach that could be applied to other phenotypes.

Publication

Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning

Publisher: Cold Spring Harbor Laboratory

Date: 23-08-2017

DOI: 10.1101/179531

Abstract: Sequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling: directly translating the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4000 reads, we show that our model provides state-of-the-art basecalling accuracy even on previously unseen species. Chiron achieves basecalling speeds of over 2000 bases per second using desktop computer graphics processing units.

Publication

Octapeptin C4 Induces Less Resistance and Novel Mutations in an Epidemic Carbapenemase-producing Klebsiella pneumoniae ST258 Clinical Isolate Compared to Polymyxins

Publisher: Cold Spring Harbor Laboratory

Date: 28-04-2018

DOI: 10.1101/309674

Abstract: Polymyxin B and E (colistin) have been pivotal in the treatment of extensively drug-resistant (XDR) Gram-negative bacterial infections, with increasing use over the past decade. Unfortunately, resistance to these antibiotics is rapidly emerging. The structurally-related octapeptin C4 (OctC4) has shown significant potency against XDR bacteria, including against polymyxin-resistant (Pmx-R) strains, but its mode of action remains undefined. We sought to compare and contrast the acquisition of XDR Klebsiella pneumoniae (ST258) resistance in vitro with all three lipopeptides to help elucidate the mode of action of the drugs and potential mechanisms of resistance evolution. Strikingly, 20 days of exposure to the polymyxins resulted in a dramatic (1000-fold) increase in the minimum inhibitory concentration (MIC) for the polymyxins, reflecting the evolution of resistance seen in clinical isolates, whereas for OctC4 only a 4-fold increase was witnessed. There was no cross-resistance observed between the polymyxin - and octapeptin-induced resistant strains. Sequencing revealed previously known gene alterations for polymyxin resistance, including crrB , mgrB , pmrB , phoPQ and yciM , and novel mutations in qseC . In contrast, mutations in mlaDF and pqiB , 1genes related to phospholipid transport, were found in octapeptin-resistant isolates. Mutation effects were validated via complementation assays. These genetic variations were reflected in phenotypic changes to lipid A. Pmx-R isolates increased 4-amino-4-deoxy-arabinose fortification to phosphate groups of lipid A, whereas OctC4 induced strains harbored a higher abundance of hydroxymyristate and palmitoylate. The results reveal a differing mode of action compared to polymyxins which provides hope for future therapeutics to combat the increasingly threat of XDR bacteria.

Publication

Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus

Publisher: Springer Science and Business Media LLC

Date: 31-08-2011

DOI: 10.1038/NATURE10406

Publication

Nanoq: ultra-fast quality control for nanopore reads

Publisher: The Open Journal

Date: 08-01-2022

DOI: 10.21105/JOSS.02991

Publication

Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

Publisher: Oxford University Press (OUP)

Date: 10-04-2018

DOI: 10.1093/GIGASCIENCE/GIY037

Publication

MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS

Publisher: Public Library of Science (PLoS)

Date: 02-05-2012

DOI: 10.1371/JOURNAL.PONE.0034861

Publication

Privacy-Preserving User Profile Matching in Social Networks

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2020

DOI: 10.1109/TKDE.2019.2912748

Publication

famCNV: copy number variant association for quantitative traits in families

Publisher: Oxford University Press (OUP)

Date: 05-05-2011

DOI: 10.1093/BIOINFORMATICS/BTR264

Abstract: Summary: A program package to enable genome-wide association of copy number variants (CNVs) with quantitative phenotypes in families of arbitrary size and complexity. Intensity signals that act as proxies for the number of copies are modeled in a variance component framework and association with traits is assessed through formal likelihood testing. Availability and implementation: The Java package is made available at www.imperial.ac.uk/medicine eople/m.falchi/. Contact: m.falchi@imperial.ac.uk

Publication

Service mining for internet of things

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-46295-0_36

Publication

Computational analysis and prediction of PE_PGRS proteins using machine learning

Publisher: Elsevier BV

Date: 2022

DOI: 10.1016/J.CSBJ.2022.01.019

Publication

Signatures Of Tspan8 Variants Associated With Human Metabolic Regulation And Diseases

Publisher: Cold Spring Harbor Laboratory

Date: 19-11-2020

DOI: 10.1101/2020.11.17.386839

Abstract: Here, with the ex le of common copy number variation (CNV) in the TSPAN8 gene, we present an important piece of work in the field of CNV detection, CNV association with complex human traits such as 1 H NMR metabolomic phenotypes and an ex le of functional characterization of CNVs among human induced pluripotent stem cells (HipSci). We report TSPAN8 exon 11 as a new locus associated with metabolomic regulation and show that its biology is associated with several metabolic diseases such as type 2 diabetes (T2D), obesity and cancer. Our results further demonstrate the power of multivariate association models over univariate methods and define new metabolomic signatures for several new genomic loci, which can act as a catalyst for new diagnostics and therapeutic approaches.

Publication

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

Publisher: Oxford University Press (OUP)

Date: 05-07-2021

DOI: 10.1093/BIB/BBAB245

Abstract: Pseudouridine is a ubiquitous RNA modification type present in eukaryotes and prokaryotes, which plays a vital role in various biological processes. Almost all kinds of RNAs are subject to this modification. However, it remains a great challenge to identify pseudouridine sites via experimental approaches, requiring expensive and time-consuming experimental research. Therefore, computational approaches that can be used to perform accurate in silico identification of pseudouridine sites from the large amount of RNA sequence data are highly desirable and can aid in the functional elucidation of this critical modification. Here, we propose a new computational approach, termed Porpoise, to accurately identify pseudouridine sites from RNA sequence data. Porpoise builds upon a comprehensive evaluation of 18 frequently used feature encoding schemes based on the selection of four types of features, including binary features, pseudo k-tuple composition, nucleotide chemical property and position-specific trinucleotide propensity based on single-strand (PSTNPss). The selected features are fed into the stacked ensemble learning framework to enable the construction of an effective stacked model. Both cross-validation tests on the benchmark dataset and independent tests show that Porpoise achieves superior predictive performance than several state-of-the-art approaches. The application of model interpretation tools demonstrates the importance of PSTNPs for the performance of the trained models. This new method is anticipated to facilitate community-wide efforts to identify putative pseudouridine sites and formulate novel testable biological hypothesis.

Publication

Rapid diagnosis of Capnocytophaga canimorsus septic shock in an immunocompetent individual using real-time Nanopore sequencing: a case report

Publisher: Springer Science and Business Media LLC

Date: 24-07-2019

DOI: 10.1186/S12879-019-4173-2

Publication

Trust Management in Cloud Services

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-12250-2

Publication

Multi-clonal evolution of multi-drug-resistant/extensively drug-resistant Mycobacterium tuberculosis in a high-prevalence setting of Papua New Guinea for over three decades.

Publisher: Microbiology Society

Date: 02-2018

DOI: 10.1099/MGEN.0.000147

Publication

A census of human cancer genes

Publisher: Springer Science and Business Media LLC

Date: 03-2004

DOI: 10.1038/NRC1299

Publication

Genetic variability in the regulation of gene expression in ten regions of the human brain

Publisher: Springer Science and Business Media LLC

Date: 31-08-2014

DOI: 10.1038/NN.3801

Publication

Investigation of the HIN200 Locus in UK SLE Families Identifies Novel Copy Number Variants

Publisher: Wiley

Date: 14-03-2011

DOI: 10.1111/J.1469-1809.2011.00641.X

Abstract: We undertook a candidate locus study of the HIN200 gene cluster on 1q21-23 in UK systemic lupus erythematosus (SLE) families. To date, despite mounting evidence demonstrating the importance of these proteins in autoimmune disease, cancer, apoptosis, inflammation, and cell cycle arrest, there has been a dearth of data with respect to the genetic characterisation of the HIN200 locus in SLE or any other disease. We typed 83 single nucleotide polymorphisms (SNPs) across 317 kb of the HIN200 cluster in 428 UK SLE families and sought replication from a European-American lupus cohort. We do not find strong evidence of SNP association in either cohort. Interestingly, we do observe a trend for association with certain HIN200 SNPs and serologic subphenotypes in UK SLE that parallels the association of lupus antibodies with the orthologous murine locus. Furthermore, we find the HIN200 locus to be unexpectedly complex in terms of genetic structural organisation. We have identified a number of copy number variants (CNVs) in this region in healthy French males, HapMap s les, and UK SLE families. In summary, candidate interferon signalling genes show evidence of common CNV in human SLE and healthy subjects. The impact of these CNVs in health and disease remains to be determined.

Publication

Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes

Publisher: F1000 Research Ltd

Date: 05-09-2018

DOI: 10.12688/GATESOPENRES.12856.1

Abstract: Background: The chloroplast (cp) genome is an important resource for studying plant ersity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage ( ×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida . Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.

Publication

Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes

Publisher: F1000 Research Ltd

Date: 21-07-2020

DOI: 10.12688/GATESOPENRES.12856.2

Abstract: Background: The chloroplast (cp) genome is an important resource for studying plant ersity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage ( ×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida . Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.

Publication

Multi-Use Trust in Crowdsourced IoT Services

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 03-2023

DOI: 10.1109/TSC.2022.3160469

Publication

Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement

Publisher: Springer Science and Business Media LLC

Date: 02-11-2018

DOI: 10.1038/S41467-018-06983-8

Abstract: Sweetpotato [ Ipomoea batatas (L.) Lam.] is a globally important staple food crop, especially for sub-Saharan Africa. Agronomic improvement of sweetpotato has lagged behind other major food crops due to a lack of genomic and genetic resources and inherent challenges in breeding a heterozygous, clonally propagated polyploid. Here, we report the genome sequences of its two diploid relatives, I. trifida and I. triloba , and show that these high-quality genome assemblies are robust references for hexaploid sweetpotato. Comparative and phylogenetic analyses reveal insights into the ancient whole-genome triplication history of Ipomoea and evolutionary relationships within the Batatas complex. Using resequencing data from 16 genotypes widely used in African breeding programs, genes and alleles associated with carotenoid biosynthesis in storage roots are identified, which may enable efficient breeding of varieties with high provitamin A content. These resources will facilitate genome-enabled breeding in this important food security crop.

Publication

Direct RNA sequencing and early evolution of SARS-CoV-2

Publisher: Cold Spring Harbor Laboratory

Date: 07-03-2020

DOI: 10.1101/2020.03.05.976167

Abstract: Fundamental aspects of SARS-CoV-2 biology remain to be described, having the potential to provide insight to the response effort for this high-priority pathogen. Here we describe the first native RNA sequence of SARS-CoV-2, detailing the coronaviral transcriptome and epitranscriptome, and share these data publicly. A data-driven inference of viral genetic features and evolutionary rate is also made. The rapid sharing of sequence information throughout the SARS-CoV-2 pandemic represents an inflection point for public health and genomic epidemiology, providing early insights into the biology and evolution of this emerging pathogen.

Publication

Personalized API recommendation via implicit preference modeling

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-46295-0_44

Publication

DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions

Publisher: Oxford University Press (OUP)

Date: 07-07-2022

DOI: 10.1093/BIOINFORMATICS/BTAC454

Abstract: Accurate annotation of different genomic signals and regions (GSRs) from DNA sequences is fundamentally important for understanding gene structure, regulation and function. Numerous efforts have been made to develop machine learning-based predictors for in silico identification of GSRs. However, it remains a great challenge to identify GSRs as the performance of most existing approaches is unsatisfactory. As such, it is highly desirable to develop more accurate computational methods for GSRs prediction. In this study, we propose a general deep learning framework termed DeepGenGrep, a general predictor for the systematic identification of multiple different GSRs from genomic DNA sequences. DeepGenGrep leverages the power of hybrid neural networks comprising a three-layer convolutional neural network and a two-layer long short-term memory to effectively learn useful feature representations from sequences. Benchmarking experiments demonstrate that DeepGenGrep outperforms several state-of-the-art approaches on identifying polyadenylation signals, translation initiation sites and splice sites across four eukaryotic species including Homo sapiens, Mus musculus, Bos taurus and Drosophila melanogaster. Overall, DeepGenGrep represents a useful tool for the high-throughput and cost-effective identification of potential GSRs in eukaryotic genomes. The webserver and source code are freely available at bigdata.biocie.cn/deepgengrep/home and Github (x-cie/DeepGenGrep/). Supplementary data are available at Bioinformatics online.

Publication

Optimising Treatment Outcomes for Children and Adults Through Rapid Genome Sequencing of Sepsis Pathogens. A Study Protocol for a Prospective, Multi-Centre Trial (DIRECT)

Publisher: Frontiers Media SA

Date: 23-06-2021

DOI: 10.3389/FCIMB.2021.667680

Abstract: Sepsis contributes significantly to morbidity and mortality globally. In Australia, 20,000 develop sepsis every year, resulting in 5,000 deaths, and more than AUD$846 million in expenditure. Prompt, appropriate antibiotic therapy is effective in improving outcomes in sepsis. Conventional culture-based methods to identify appropriate therapy have limited yield and take days to complete. Recently, nanopore technology has enabled rapid sequencing with real-time analysis of pathogen DNA. We set out to demonstrate the feasibility and diagnostic accuracy of pathogen sequencing direct from clinical s les, and estimate the impact of this approach on time to effective therapy when integrated with personalised software-guided antimicrobial dosing in children and adults on ICU with sepsis. The DIRECT study is a pilot prospective, non-randomized multicentre trial of an integrated diagnostic and therapeutic algorithm combining rapid direct pathogen sequencing and software-guided, personalised antibiotic dosing in children and adults with sepsis on ICU. DIRECT will collect microbiological and pharmacokinetic s les from approximately 200 children and adults with sepsis admitted to one of four ICUs in Brisbane. In Phase 1, we will evaluate Oxford Nanopore Technologies MinION sequencing direct from blood in 50 blood culture-proven sepsis patients recruited from consecutive patients with suspected sepsis. In Phase 2, a further 50 consecutive patients with suspected sepsis will be recruited in whom MinION sequencing will be combined with Bayesian software-guided (ID-ODS) personalised antimicrobial dosing. The primary outcome is time to effective antimicrobial therapy, defined as trough drug concentrations above the MIC of the pathogen. Secondary outcomes are diagnostic accuracy of MinION sequencing from whole blood, time to pathogen identification and susceptibility testing using sequencing direct from whole blood and from positive blood culture broth. Rapid pathogen sequencing coupled with antimicrobial dosing software has great potential to overcome the limitations of conventional diagnostics which often result in prolonged inappropriate antimicrobial therapy. Reduced time to optimal antimicrobial therapy may reduce sepsis mortality and ICU length of stay. This pilot study will yield key feasibility data to inform further, urgently needed sepsis studies. Phase 2 of the trial protocol is registered with the ANZCTR (ACTRN12620001122943). Registered with the Australia New Zealand Clinical Trials Registry Number ACTRN12620001122943

Publication

Temporal pattern based QoS prediction

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-48743-4_18

Publication

YHap: a population model for probabilistic assignment of Y haplogroups from re-sequencing data

Publisher: Springer Science and Business Media LLC

Date: 19-11-2013

DOI: 10.1186/1471-2105-14-331

Abstract: Y haplogroup analyses are an important component of genealogical reconstruction, population genetic analyses, medical genetics and forensics. These fields are increasingly moving towards use of low-coverage, high throughput sequencing. While there have been methods recently proposed for assignment of Y haplogroups on the basis of high-coverage sequence data, assignment on the basis of low-coverage data remains challenging. We developed a new algorithm, YHap, which uses an imputation framework to jointly predict Y chromosome genotypes and assign Y haplogroups using low coverage population sequence data. We use data from the 1000 genomes project to demonstrate that YHap provides accurate Y haplogroup assignment with less than 2x coverage. Borrowing information across multiple s les within a population using an imputation framework enables accurate Y haplogroup assignment.

Publication

Diagnosis of Kawasaki Disease Using a Minimal Whole-Blood Gene Expression Signature

Publisher: American Medical Association (AMA)

Date: 10-2018

DOI: 10.1001/JAMAPEDIATRICS.2018.2293

Publication

Drone-as-a-Service Composition Under Uncertainty

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2022

DOI: 10.1109/TSC.2021.3066006

Publication

A complete high quality nanopore-only assembly of an XDR Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions

Publisher: Cold Spring Harbor Laboratory

Date: 30-01-2018

DOI: 10.1101/256719

Abstract: A better understanding of the genomic changes that facilitate the emergence and spread of drug resistant M. tuberculosis strains is required. Short-read sequencing methods have limited capacity to identify long, repetitive genomic regions and gene duplications. We sequenced an extensively drug resistant (XDR) Beijing sub-lineage 2.2.1.1 “epidemic strain” from the Western Province of Papua New Guinea using long-read sequencing (Oxford Nanopore MinION®). With up to 274 fold coverage from a single flow-cell, we assembled a 4404947bp circular genome containing 3670 coding sequences that include the highly repetitive PE/PPE genes. Comparison with Illumina reads indicated a base-level accuracy of 99.95%. Mutations known to confer drug resistance to first and second line drugs were identified and concurred with phenotypic resistance assays. We identified mutations in efflux pump genes (Rv0194), transporters ( secA1 , glnQ , uspA ), cell wall biosynthesis genes ( pdk , mmpL , fadD ) and virulence genes ( mce -gene family, mycp1 ) that may contribute to the drug resistance phenotype and successful transmission of this strain. Using the newly assembled genome as reference to map raw Illumina reads from representative M. tuberculosis lineages, we detect large insertions relative to the reference genome. We provide a fully annotated genome of a transmissible XDR M. tuberculosis strain from Papua New Guinea using Oxford Nanopore MinION sequencing and provide insight into genomic mechanisms of resistance and virulence. S le Illumina and MinION sequencing reads generated and analyzed are available in NCBI under project accession number PRJNA386696 ( ra/?term=PRJNA386696 ) The assembled complete genome and its annotations are available in NCBI under accession number CP022704.1 ( ra/?term=CP022704.1 ) We recently characterized a Modern Beijing lineage strain responsible for the drug resistance outbreaks in the Western province, Papua New Guinea. With some of the genomic markers responsible for its drug resistance and transmissibility are known, there is need to elucidate all molecular mechanisms that account for the resistance phenotype, virulence and transmission. Whole genome sequencing using short reads has widely been utilized to study MTB genome but it does not generally capture long repetitive regions as variants in these regions are eliminated using analysis. Illumina instruments are known to have a GC bias so that regions with high GC or AT rich are under s led and this effect is exacerbated in MTB, which has approximately 65% GC content. In this study, we utilized Oxford Nanopore Technologies (ONT) MinION sequencing to assemble a high-quality complete genome of an extensively drug resistant strain of a modern Beijing lineage. We were able to able to assemble all PE/PPE (proline-glutamate roline-proline-glutamate) gene families that have high GC content and repetitive in nature. We show the genomic utility of ONT in offering a more comprehensive understanding of genetic mechanisms that contribute to resistance, virulence and transmission. This is important for settings up predictive analytics platforms and services to support diagnostics and treatment.

Publication

On building a hyperdistributed database

Publisher: Elsevier BV

Date: 11-1995

DOI: 10.1016/0306-4379(95)00030-8

Publication

Genome-wide association analysis of metabolic traits in a birth cohort from a founder population

Publisher: Springer Science and Business Media LLC

Date: 07-12-2008

DOI: 10.1038/NG.271

Publication

Novel association approach for variable number tandem repeats (VNTRs) identifies DOCK5 as a susceptibility gene for severe obesity

Publisher: Oxford University Press (OUP)

Date: 16-05-2012

DOI: 10.1093/HMG/DDS187

Publication

SARS-CoV-2 mouse adaptation selects virulence mutations that cause TNF-driven age-dependent severe disease with human correlates

Publisher: Proceedings of the National Academy of Sciences

Date: 31-07-2023

DOI: 10.1073/PNAS.2301689120

Abstract: The ersity of COVID-19 disease in otherwise healthy people, from seemingly asymptomatic infection to severe life-threatening disease, is not clearly understood. We passaged a naturally occurring near-ancestral SARS-CoV-2 variant, capable of infecting wild-type mice, and identified viral genomic mutations coinciding with the acquisition of severe disease in young adult mice and lethality in aged animals. Transcriptomic analysis of lung tissues from mice with severe disease elucidated a host antiviral response dominated mainly by interferon and IL-6 pathway activation in young mice, while in aged animals, a fatal outcome was dominated by TNF and TGF-β signaling. Congruent with our pathway analysis, we showed that young TNF-deficient mice had mild disease compared to controls and aged TNF-deficient animals were more likely to survive infection. Emerging clinical correlates of disease are consistent with our preclinical studies, and our model may provide value in defining aberrant host responses that are causative of severe COVID-19.

Publication

Building enterprise mashups

Publisher: Elsevier BV

Date: 05-2011

DOI: 10.1016/J.FUTURE.2010.10.004

Publication

Complete Genome Sequences of Clinical Pandoraea fibrosis Isolates

Publisher: American Society for Microbiology

Date: 26-03-2020

DOI: 10.1128/MRA.00060-20

Abstract: Pandoraea fibrosis is a newly identified Gram-negative bacterial species that was isolated from the respiratory tract of an Australian cystic fibrosis patient. The complete assembled genome sequences of two consecutive isolates (second isolate collected 11 months after antibiotic treatment) from the same in idual are presented here.

Publication

Identification of reduced host transcriptomic signatures for tuberculosis and digital PCR-based validation and quantification

Publisher: Cold Spring Harbor Laboratory

Date: 21-03-2019

DOI: 10.1101/583674

Abstract: Recently, host whole blood gene expression signatures have been identified for diagnosis of tuberculosis (TB). Absolute quantification of the concentrations of signature transcripts in blood have not been reported, but would facilitate the development of diagnostic tests. To identify minimal transcript signatures, we applied a novel transcript selection procedure to microarray data from African adults comprising 536 patients with TB, other diseases (OD) and latent TB (LTBI), ided into training and test sets. Signatures were validated using reverse transcriptase (RT) - digital PCR (dPCR). A four-transcript signature ( GBP6 , TMCC1 , PRDM1 , ARG1 ) measured using RT-dPCR distinguished TB patients from those with OD (area under the curve (AUC) 93.8% (CI 95% 82.2 – 100%). A three-transcript signature ( FCGR1A, ZNF296, C1QB ) differentiated TB from LTBI (AUC 97.3%, CI 95% : 93.3 – 100%), regardless of HIV. These signatures have been validated across platforms and across s les offering strong, quantitative support for their use as diagnostic biomarkers for TB.

Publication

CCCloud: Context-aware and credible cloud service selection based on subjective assessment and objective assessment

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2015

DOI: 10.1109/TSC.2015.2413111

Publication

Whole-exome Sequencing for the Identification of Rare Variants in Primary Immunodeficiency Genes in Children With Sepsis: A Prospective, Population-based Cohort Study

Publisher: Oxford University Press (OUP)

Date: 18-03-2020

DOI: 10.1093/CID/CIAA290

Abstract: The role of primary immunodeficiencies (PID) in susceptibility to sepsis remains unknown. It is unclear whether children with sepsis benefit from genetic investigations. We hypothesized that sepsis may represent the first manifestation of underlying PID. We applied whole-exome sequencing (WES) to a national cohort of children with sepsis to identify rare, predicted pathogenic variants in PID genes. We conducted a multicenter, population-based, prospective study including previously healthy children aged ≥28 days and & years admitted with blood culture-proven sepsis. Using a stringent variant filtering procedure, analysis of WES data was restricted to rare, predicted pathogenic variants in 240 PID genes for which increased susceptibility to bacterial infection has been reported. There were 176 children presenting with 185 sepsis episodes who underwent WES (median age, 52 months interquartile range, 15.4–126.4). There were 41 unique predicted pathogenic PID variants (1 homozygous, 5 hemizygous, and 35 heterozygous) found in 35/176 (20%) patients, including 3/176 (2%) patients carrying variants that were previously reported to lead to PID. The variants occurred in PID genes across all 8 PID categories, as defined by the International Union of Immunological Societies. We did not observe a significant correlation between clinical or laboratory characteristics of patients and the presence or absence of PID variants. Applying WES to a population-based cohort of previously healthy children with bacterial sepsis detected variants of uncertain significance in PID genes in 1 out of 5 children. Future studies need to investigate the functional relevance of these variants to determine whether variants in PID genes contribute to pediatric sepsis susceptibility.

Publication

Genome-wide association and genetic functional studies identify autism susceptibility candidate 2 gene ( AUTS2 ) in the regulation of alco

Publisher: Proceedings of the National Academy of Sciences

Date: 06-04-2011

DOI: 10.1073/PNAS.1017288108

Abstract: Alcohol consumption is a moderately heritable trait, but the genetic basis in humans is largely unknown, despite its clinical and societal importance. We report a genome-wide association study meta-analysis of ∼2.5 million directly genotyped or imputed SNPs with alcohol consumption (gram per day per kilogram body weight) among 12 population-based s les of European ancestry, comprising 26,316 in iduals, with replication genotyping in an additional 21,185 in iduals. SNP rs6943555 in autism susceptibility candidate 2 gene ( AUTS2 ) was associated with alcohol consumption at genome-wide significance ( P = 4 × 10 −8 to P = 4 × 10 −9 ). We found a genotype-specific expression of AUTS2 in 96 human prefrontal cortex s les ( P = 0.026) and significant ( P 0.017) differences in expression of AUTS2 in whole-brain extracts of mice selected for differences in voluntary alcohol consumption. Down-regulation of an AUTS2 homolog caused reduced alcohol sensitivity in Drosophila ( P 0.001). Our finding of a regulator of alcohol consumption adds knowledge to our understanding of genetic mechanisms influencing alcohol drinking behavior.

Publication

Simulating the Dynamics of Targeted Capture Sequencing with CapSim

Publisher: Cold Spring Harbor Laboratory

Date: 05-05-2017

DOI: 10.1101/134510

Abstract: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridisation process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridisation, and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimise the efficacy of the experiments. CapSim is publicly available under BSD license at dcao/capsim .

Publication

Accurate Single-Nucleotide Polymorphism Allele Assignment in Trisomic or Duplicated Regions by Using a Single Base–Extension Assay with MALDI-TOF Mass Spectrometry

Publisher: Oxford University Press (OUP)

Date: 08-2011

DOI: 10.1373/CLINCHEM.2010.159558

Abstract: The accurate assignment of alleles embedded within trisomic or duplicated regions is an essential prerequisite for assessing the combined effects of single-nucleotide polymorphisms (SNPs) and genomic copy number. Such an integrated analysis is challenging because heterozygotes for such a SNP may be one of 2 genotypes—AAB or ABB. Established methods for SNP genotyping, however, can have difficulty discriminating between the 2 heterozygous trisomic genotypes. We developed a method for assigning heterozygous trisomic genotypes that uses the ratio of the height of the 2 allele peaks obtained by mass spectrometry after a single-base extension assay. Eighteen COL6A2 (collagen, type VI, alpha 2) SNPs were analyzed in euploid and trisomic in iduals by means of a multiplexed single-base extension assay that generated allele-specific oligonucleotides of differing Mr values for detection by MALDI-TOF mass spectrometry. Reference data (mean and SD) for the allele peak height ratios were determined from heterozygous euploid s les. The heterozygous trisomic genotypes were assigned by calculating the z score for each trisomic allele peak height ratio and by considering the sign (+/−) of the z score. Heterozygous trisomic genotypes were assigned in 96.1% (range, 89.9%–100%) of the s les for each SNP analyzed. The genotypes obtained were reproduced in 95 (97.5%) of 97 loci retested in a second assay. Subsequently, the origin of nondisjunction was determined in 108 (82%) of 132 family trios with a Down syndrome child. This approach enabled reliable genotyping of heterozygous trisomic s les and the determination of the origin of nondisjunction in Down syndrome family trios.

Publication

Enhanced protein domain discovery by using language modeling techniques from speech recognition.

Publisher: Proceedings of the National Academy of Sciences

Date: 31-03-2003

DOI: 10.1073/PNAS.0737502100

Abstract: Most modern speech recognition uses probabilistic models to interpret a sequence of sounds. Hidden Markov models, in particular, are used to recognize words. The same techniques have been adapted to find domains in protein sequences of amino acids. To increase word accuracy in speech recognition, language models are used to capture the information that certain word combinations are more likely than others, thus improving detection based on context. However, to date, these context techniques have not been applied to protein domain discovery. Here we show that the application of statistical language modeling methods can significantly enhance domain recognition in protein sequences. As an ex le, we discover an unannotated Tf_Otx Pfam domain on the cone rod homeobox protein, which suggests a possible mechanism for how the V242M mutation on this protein causes cone-rod dystrophy.

Publication

Understanding Detrimental Host Response to Infection-The Promise of Transcriptomics∗

Publisher: Ovid Technologies (Wolters Kluwer Health)

Date: 02-2022

DOI: 10.1097/PCC.0000000000002870

Publication

Comparison of long read methods for sequencing and assembly of a plant genome

Publisher: Cold Spring Harbor Laboratory

Date: 18-03-2020

DOI: 10.1101/2020.03.16.992933

Abstract: Sequencing technologies have advanced to the point where it is possible to generate high accuracy, haplotype resolved, chromosome scale assemblies. Several long read sequencing technologies are available on the market and a growing number of algorithms have been developed over the last years to assemble the reads generated by those technologies. When starting a new genome project, it is therefore challenging to select the most cost-effective sequencing technology as well as the most appropriate software for assembly and polishing. For this reason, it is important to benchmark different approaches applied to the same s le. Here, we report a comparison of three long read sequencing technologies applied to the de novo assembly of a plant genome, Macadamia jansenii . We have generated sequencing data using Pacific Biosciences (Sequel I), Oxford Nanopore Technologies (PromethION) and BGI (single-tube Long Fragment Read) technologies for the same s le. Several assemblers were benchmarked in the assembly of PacBio and Nanopore reads. Results obtained from combining long read technologies or short read and long read technologies are also presented. The assemblies were compared for contiguity, accuracy and completeness as well as sequencing costs and DNA material requirements. Overall, the three long read technologies produced highly contiguous and complete genome assemblies of Macadamia jansenii . At the time of sequencing, the cost associated with each method was significantly different but continuous improvements in technologies have resulted in greater accuracy, increased throughput and reduced costs. We propose updating this comparison regularly with reports on significant iterations of the sequencing technologies.

Publication

Triclosan at environmentally relevant concentrations promotes horizontal transfer of multidrug resistance genes within and across bacterial genera

Publisher: Elsevier BV

Date: 12-2018

DOI: 10.1016/J.ENVINT.2018.10.040

Abstract: Antibiotic resistance poses an increasing threat to public health. Horizontal gene transfer (HGT) promoted by antibiotics is recognized as a significant pathway to disseminate antibiotic resistance genes (ARGs). However, it is unclear whether non-antibiotic, anti-microbial (NAAM) chemicals can directly promote HGT of ARGs in the environment. We aimed to investigate whether triclosan (TCS), a widely-used NAAM chemical in personal care products, is able to stimulate the conjugative transfer of antibiotic multi-resistance genes carried by plasmid within and across bacterial genera. We established two model mating systems, to investigate intra-genera transfer and inter-genera transfer. Escherichia coli K-12 LE392 carrying IncP-α plasmid RP4 was used as the donor, and E. coli K-12 MG1655 or Pseudomonas putida KT2440 were the intra- and inter-genera recipients, respectively. The mechanisms of the HGT promoted by TCS were unveiled by detecting oxidative stress and cell membrane permeability, in combination with Nanopore sequencing, genome-wide RNA sequencing and proteomic analyses. Exposure of the bacteria to environmentally relevant concentrations of TCS (from 0.02 μg/L to 20 μg/L) significantly stimulated the conjugative transfer of plasmid-encoded multi-resistance genes within and across genera. The TCS exposure promoted ROS generation and damaged bacterial membrane, and caused increased expression of the SOS response regulatory genes umuC, dinB and dinD in the donor. In addition, higher expression levels of ATP synthesis encoding genes in E. coli and P. putida were found with increased TCS dosage. TCS could enhance the conjugative ARGs transfer between bacteria by triggering ROS overproduction at environmentally relevant concentrations. These findings improve our awareness of the hidden risks of NAAM chemicals on the spread of antibiotic resistance.

Publication

Transforming Growth Factor-β Signaling Pathway in Patients With Kawasaki Disease

Publisher: Ovid Technologies (Wolters Kluwer Health)

Date: 02-2011

DOI: 10.1161/CIRCGENETICS.110.940858

Abstract: Transforming growth factor (TGF)-β is a multifunctional peptide that is important in T-cell activation and cardiovascular remodeling, both of which are important features of Kawasaki disease (KD). We postulated that variation in TGF-β signaling might be important in KD susceptibility and disease outcome. We investigated genetic variation in 15 genes belonging to the TGF-β pathway in a total of 771 KD subjects of mainly European descent from the United States, the United Kingdom, Australia, and the Netherlands. We analyzed transcript abundance patterns using microarray and reverse transcriptase–polymerase chain reaction for these same genes, and measured TGF-β2 protein levels in plasma. Genetic variants in TGFB2 , TGFBR2 , and SMAD3 and their haplotypes were consistently and reproducibly associated with KD susceptibility, coronary artery aneurysm formation, aortic root dilatation, and intravenous immunoglobulin treatment response in different cohorts. A SMAD3 haplotype associated with KD susceptibility replicated in 2 independent cohorts and an intronic single nucleotide polymorphism in a separate haplotype block was also strongly associated (A/G, rs4776338) ( P =0.000022 odds ratio, 1.50 95% confidence interval, 1.25 to 1.81). Pathway analysis using all 15 genes further confirmed the importance of the TGF-β pathway in KD pathogenesis. Whole-blood transcript abundance for these genes and TGF-β2 plasma protein levels changed dynamically over the course of the illness. These studies suggest that genetic variation in the TGF-β pathway influences KD susceptibility, disease outcome, and response to therapy, and that aortic root and coronary artery Z scores can be used for phenotype/genotype analyses. Analysis of transcript abundance and protein levels further support the importance of this pathway in KD pathogenesis.

Publication

Genotype-free demultiplexing of pooled single-cell RNA-seq

Publisher: Springer Science and Business Media LLC

Date: 12-2019

DOI: 10.1186/S13059-019-1852-7

Abstract: A variety of methods have been developed to demultiplex pooled s les in a single cell RNA sequencing (scRNA-seq) experiment which either require hashtag barcodes or s le genotypes prior to pooling. We introduce scSplit which utilizes genetic differences inferred from scRNA-seq data alone to demultiplex pooled s les. scSplit also enables mapping clusters to original s les. Using simulated, merged, and pooled multi-in idual datasets, we show that scSplit prediction is highly concordant with demuxlet predictions and is highly consistent with the known truth in cell-hashing dataset. scSplit is ideally suited to s les without external genotype information and is available at: on-xu/scSplit

Publication

Ongoing human chromosome end extension revealed by analysis of BioNano and nanopore data

Publisher: Cold Spring Harbor Laboratory

Date: 14-02-2017

DOI: 10.1101/108365

Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence (10%) and extensions(22%). These extensions are heritable and in some cases ergent between Asian and European s les. Six ninths of the extension sequence in NA12878 can be confirmed and filled by nanopore data. We identify two sequence families in these sequences which have undergone substantial duplication in multiple primate lineages. We show that these sequence families have arisen from progenitor interstitial sequence on the ancestral primate chromosome 7. Comparison of chromosome end sequences from 15 species revealed that chromosome end missing sequence matches the corresponding phylogenetic relationship and revealed a rate of chromosome extension per chromosome of 0.0020 bp per year in average.

Publication

Octapeptin C4 and polymyxin resistance occur via distinct pathways in an epidemic XDRKlebsiella pneumoniaeST258 isolate

Publisher: Oxford University Press (OUP)

Date: 14-11-2018

DOI: 10.1093/JAC/DKY458

Publication

Correction for Schumann et al., Genome-wide association and genetic functional studies identify autism susceptibility candidate 2 gene (AUTS2) in the regulation of alcoh

Publisher: Proceedings of the National Academy of Sciences

Date: 13-05-2011

DOI: 10.1073/PNAS.1106917108

Publication

npInv: Accurate detection and genotyping of inversions using long read sub-alignment

Publisher: Springer Science and Business Media LLC

Date: 13-07-2018

DOI: 10.1186/S12859-018-2252-9

Publication

New technologies for diagnosing active TB: the VANTDET diagnostic accuracy study

Publisher: National Institute for Health and Care Research

Date: 04-2021

DOI: 10.3310/EME08050

Abstract: Tuberculosis (TB) is a devastating disease for which new diagnostic tests are desperately needed. To validate promising new technologies [namely whole-blood transcriptomics, proteomics, flow cytometry and quantitative reverse transcription-polymerase chain reaction (qRT-PCR)] and existing signatures for the detection of active TB in s les obtained from in iduals with suspected active TB. Four substudies, each of which used s les from the biobank collected as part of the interferon gamma release assay (IGRA) in the Diagnostic Evaluation of Active TB study, which was a prospective cohort of patients recruited with suspected TB. Secondary care. Adults aged ≥ 16 years presenting as inpatients or outpatients at 12 NHS hospital trusts in London, Slough, Oxford, Leicester and Birmingham, with suspected active TB. New tests using genome-wide gene expression microarray (transcriptomics), surface-enhanced laser desorption ionisation time-of-flight mass spectrometry/liquid chromatography–mass spectrometry (proteomics), flow cytometry or qRT-PCR. Area under the curve (AUC), sensitivity and specificity were calculated to determine diagnostic accuracy. Positive and negative predictive values were calculated in some cases. A decision tree model was developed to calculate the incremental costs and quality-adjusted life-years of changing from current practice to using the novels tests. The project, and four substudies that assessed the previously published signatures, measured each of the new technologies and performed a health economic analysis in which the best-performing tests were evaluated for cost-effectiveness. The diagnostic accuracy of the transcriptomic tests ranged from an AUC of 0.81 to 0.84 for detecting all TB in our cohort. The performance for detecting culture-confirmed TB or pulmonary TB was better than for highly probable TB or extrapulmonary tuberculosis (EPTB), but was not high enough to be clinically useful. None of the previously described serum proteomic signatures for active TB provided good diagnostic accuracy, nor did the candidate rule-out tests. Four out of six previously described cellular immune signatures provided a reasonable level of diagnostic accuracy (AUC = 0.78–0.92) for discriminating all TB from those with other disease and latent TB infection in human immunodeficiency virus-negative TB suspects. Two of these assays may be useful in the IGRA-positive population and can provide high positive predictive value. None of the new tests for TB can be considered cost-effective. The diagnostic performance of new tests among the HIV-positive population was either underpowered or not sufficiently achieved in each substudy. Overall, the diagnostic performance of all previously identified ‘signatures’ of TB was lower than previously reported. This probably reflects the nature of the cohort we used, which includes the harder to diagnose groups, such as culture-unconfirmed TB or EPTB, which were under-represented in previous cohorts. We are yet to achieve our secondary objective of deriving novel signatures of TB using our data sets. This was beyond the scope of this report. We recommend that future studies using these technologies target specific subtypes of TB, specifically those groups for which new diagnostic tests are required. This project was funded by the Efficacy and Mechanism Evaluation (EME) programme, a MRC and NIHR partnership.

Publication

Disease association tests by inferring ancestral haplotypes using a hidden markov model

Publisher: Oxford University Press (OUP)

Date: 23-02-2008

DOI: 10.1093/BIOINFORMATICS/BTN071

Abstract: Motivation: Most genome-wide association studies rely on single nucleotide polymorphism (SNP) analyses to identify causal loci. The increased stringency required for genome-wide analyses (with per-SNP significance threshold typically ≈ 10−7) means that many real signals will be missed. Thus it is still highly relevant to develop methods with improved power at low type I error. Haplotype-based methods provide a promising approach however, they suffer from statistical problems such as abundance of rare haplotypes and ambiguity in defining haplotype block boundaries. Results: We have developed an ancestral haplotype clustering (AncesHC) association method which addresses many of these problems. It can be applied to biallelic or multiallelic markers typed in haploid, diploid or multiploid organisms, and also handles missing genotypes. Our model is free from the assumption of a rigid block structure but recognizes a block-like structure if it exists in the data. We employ a Hidden Markov Model (HMM) to cluster the haplotypes into groups of predicted common ancestral origin. We then test each cluster for association with disease by comparing the numbers of cases and controls with 0, 1 and 2 chromosomes in the cluster. We demonstrate the power of this approach by simulation of case-control status under a range of disease models for 1500 outcrossed mice originating from eight inbred lines. Our results suggest that AncesHC has substantially more power than single-SNP analyses to detect disease association, and is also more powerful than the cladistic haplotype clustering method CLADHC. Availability: The software can be downloaded from www.imperial.ac.uk/medicine eople/l.coin Contact: I.coin@imperial.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.

Publication

The Effect of Genomic Inversions on Estimation of Population Genetic Parameters from SNP Data

Publisher: Oxford University Press (OUP)

Date: 2013

DOI: 10.1534/GENETICS.112.145599

Abstract: In recent years it has emerged that structural variants have a substantial impact on genomic variation. Inversion polymorphisms represent a significant class of structural variant, and despite the challenges in their detection, data on inversions in the human genome are increasing rapidly. Statistical methods for inferring parameters such as the recombination rate and the selection coefficient have generally been developed without accounting for the presence of inversions. Here we exploit new software for simulating inversions in population genetic data, invertFREGENE, to assess the potential impact of inversions on such methods. Using data simulated by invertFREGENE, as well as real data from several sources, we test whether large inversions have a disruptive effect on widely applied population genetics methods for inferring recombination rates, for detecting selection, and for controlling for population structure in genome-wide association studies (GWAS). We find that recombination rates estimated by LDhat are biased downward at inversion loci relative to the true contemporary recombination rates at the loci but that recombination hotspots are not falsely inferred at inversion breakpoints as may have been expected. We find that the integrated haplotype score (iHS) method for detecting selection appears robust to the presence of inversions. Finally, we observe a strong bias in the genome-wide results of principal components analysis (PCA), used to control for population structure in GWAS, in the presence of even a single large inversion, confirming the necessity to thin SNPs by linkage disequilibrium at large physical distances to obtain unbiased results.

Publication

No evidence of SARS-CoV-2 reverse transcription and integration as the origin of chimeric transcripts in patient tissues

Publisher: Proceedings of the National Academy of Sciences

Date: 03-08-2021

DOI: 10.1073/PNAS.2109066118

Publication

Rapid nanopore metagenomic sequencing and predictive susceptibility testing of positive blood cultures from intensive care patients with sepsis

Publisher: Cold Spring Harbor Laboratory

Date: 22-06-2023

DOI: 10.1101/2023.06.15.23291261

Abstract: Direct metagenomic sequencing from positive blood culture (BC) broths, to identify bacteria and predict antimicrobial susceptibility, has been previously demonstrated using Illumina-based methods, but is relatively slow. We aimed to evaluate this approach using nanopore sequencing to provide more rapid results. Patients with suspected sepsis in 4 intensive care units were prospectively enrolled. Human-depleted DNA was extracted from positive BC broths and sequenced using nanopore (MinION). Species abundance was estimated using Kraken2, and a cloud-based artificial intelligence (AI) system (AREScloud) provided in silico antimicrobial susceptibility testing (AST) from assembled contigs. These results were compared to conventional identification and phenotypic AST. Genus-level agreement between conventional methods and metagenomic whole genome sequencing (MG-WGS) was 96.2% (50/52), but increased to 100% in monomicrobial infections. In total, 262 high quality AREScloud AST predictions across 24 s les were made, exhibiting categorical agreement (CA) of 89.3%, with major error (MA) and very major error (VME) rates of 10.5% and 12.1%, respectively. Over 90% CA was achieved for some taxa (e.g. Staphylococcus aureus ), but was suboptimal for Pseudomonas aeruginosa (CA 50%). In 470 AST predictions across 42 s les, with both high quality and exploratory-only predictions, overall CA, ME and VME rates were 87.7%, 8.3% and 28.4%. VME rates were inflated by false susceptibility calls in a small number of species / antibiotic combinations with few representative resistant isolates. Time to reporting from MG-WGS could be achieved within 8-16 hours from blood culture positivity. Direct metagenomic sequencing from positive BC broths is feasible and can provide accurate predictive AST for some species and antibiotics, but is sub-optimal for a subset of common pathogens, with unacceptably high VME rates. Nanopore-based approaches may be faster but improvements in accuracy are required before it can be considered for clinical use. New developments in nanopore sequencing technology, and training of AI algorithms on larger and more erse datasets may improve performance.

Publication

Simulating the dynamics of targeted capture sequencing with CapSim

Publisher: Oxford University Press (OUP)

Date: 28-10-2018

DOI: 10.1093/BIOINFORMATICS/BTX691

Abstract: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridization process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridization and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimize the experiments. CapSim is publicly available under BSD license at github.com/Devika1/capsim. Supplementary data are available at Bioinformatics online.

Publication

A lipoglycopeptide antibiotic for Gram-positive biofilm-related infections

Publisher: American Association for the Advancement of Science (AAAS)

Date: 14-09-2022

DOI: 10.1126/SCITRANSLMED.ABJ2381

Abstract: Drug-resistant Gram-positive bacterial infections are still a substantial burden on the public health system, with two bacteria ( Staphylococcus aureus and Streptococcus pneumoniae ) accounting for over 1.5 million drug-resistant infections in the United States alone in 2017. In 2019, 250,000 deaths were attributed to these pathogens globally. We have developed a preclinical glycopeptide antibiotic, MCC5145, that has excellent potency (MIC 90 ≤ 0.06 μg/ml) against hundreds of isolates of methicillin-resistant S. aureus (MRSA) and other Gram-positive bacteria, with a greater than 1000-fold margin over mammalian cell cytotoxicity values. The antibiotic has therapeutic in vivo efficacy when dosed subcutaneously in multiple murine models of established bacterial infections, including thigh infection with MRSA and blood septicemia with S. pneumoniae , as well as when dosed orally in an antibiotic-induced Clostridioides difficile infection model. MCC5145 exhibited reduced nephrotoxicity at microbiologically active doses in mice compared to vancomycin. MCC5145 also showed improved activity against biofilms compared to vancomycin, both in vitro and in vivo, and a low propensity to select for drug resistance. Characterization of drug action using a transposon library bioinformatic platform showed a mechanistic distinction from other glycopeptide antibiotics.

Publication

Variation in CFHR3 determines susceptibility to meningococcal disease by controlling factor H concentrations

Publisher: Elsevier BV

Date: 09-2022

DOI: 10.1016/J.AJHG.2022.08.001

Publication

Genome-wide association analysis identifies variants associated with nonalcoholic fatty liver disease that have distinct effects on metabolic traits

Publisher: Public Library of Science (PLoS)

Date: 10-03-2011

DOI: 10.1371/JOURNAL.PGEN.1001324

Publication

Variants in MTNR1B influence fasting glucose levels

Publisher: Springer Science and Business Media LLC

Date: 07-12-2008

DOI: 10.1038/NG.290

Publication

The Pfam protein families database

Publisher: Oxford University Press (OUP)

Date: 2004

DOI: 10.1093/NAR/GKH121

Publication

Scaffolding and completing genome assemblies in real-time with nanopore sequencing

Publisher: Springer Science and Business Media LLC

Date: 20-02-2017

DOI: 10.1038/NCOMMS14515

Abstract: Third generation sequencing technologies provide the opportunity to improve genome assemblies by generating long reads spanning most repeat sequences. However, current analysis methods require substantial amounts of sequence data and computational resources to overcome the high error rates. Furthermore, they can only perform analysis after sequencing has completed, resulting in either over-sequencing, or in a low quality assembly due to under-sequencing. Here we present npScarf, which can scaffold and complete short read assemblies while the long read sequencing run is in progress. It reports assembly metrics in real-time so the sequencing run can be terminated once an assembly of sufficient quality is obtained. In assembling four bacterial and one eukaryotic genomes, we show that npScarf can construct more complete and accurate assemblies while requiring less sequencing data and computational resources than existing methods. Our approach offers a time- and resource-effective strategy for completing short read assemblies.

Publication

LobSig is a multigene predictor of outcome in invasive lobular carcinoma

Publisher: Springer Science and Business Media LLC

Date: 27-06-2019

DOI: 10.1038/S41523-019-0113-Y

Abstract: Invasive lobular carcinoma (ILC) is the most common special type of breast cancer, and is characterized by functional loss of E-cadherin, resulting in cellular adhesion defects. ILC typically present as estrogen receptor positive, grade 2 breast cancers, with a good short-term prognosis. Several large-scale molecular profiling studies have now dissected the unique genomics of ILC. We have undertaken an integrative analysis of gene expression and DNA copy number to identify novel drivers and prognostic biomarkers, using in-house ( n = 25), METABRIC ( n = 125) and TCGA ( n = 146) s les. Using in silico integrative analyses, a 194-gene set was derived that is highly prognostic in ILC ( P = 1.20 × 10 −5 )—we named this metagene ‘LobSig’. Assessing a 10-year follow-up period, LobSig outperformed the Nottingham Prognostic Index, PAM50 risk-of-recurrence (Prosigna), OncotypeDx, and Genomic Grade Index (MapQuantDx) in a stepwise, multivariate Cox proportional hazards model, particularly in grade 2 ILC cases ( χ 2 , P = 9.0 × 10 −6 ), which are difficult to prognosticate clinically. Importantly, LobSig status predicted outcome with 94.6% accuracy amongst cases classified as ‘moderate-risk’ according to Nottingham Prognostic Index in the METABRIC cohort. Network analysis identified few candidate pathways, though genesets related to proliferation were identified, and a LobSig-high phenotype was associated with the TCGA proliferative subtype ( χ 2 , P 8.86 × 10 −4 ). ILC with a poor outcome as predicted by LobSig were enriched with mutations in ERBB2 , ERBB3 , TP53 , AKT1 and ROS1 . LobSig has the potential to be a clinically relevant prognostic signature and warrants further development.

Publication

Correction to: Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

Publisher: Oxford University Press (OUP)

Date: 05-2019

DOI: 10.1093/GIGASCIENCE/GIZ049

Publication

Pangenome databases provide superior host removal and mycobacteria classification from clinical metagenomic data

Publisher: Cold Spring Harbor Laboratory

Date: 19-09-2023

DOI: 10.1101/2023.09.18.558339

Publication

Point Mutations in Exon 1B of APC Reveal Gastric Adenocarcinoma and Proximal Polyposis of the Stomach as a Familial Adenomatous Polyposis Variant

Publisher: Elsevier BV

Date: 05-2016

DOI: 10.1016/J.AJHG.2016.03.001

Publication

Diagnosis of Bacterial Infection Using a 2-Transcript Host RNA Signature in Febrile Infants 60 Days or Younger

Publisher: American Medical Association (AMA)

Date: 18-04-2017

DOI: 10.1001/JAMA.2017.1365

Publication

A complete high-quality MinION nanopore assembly of an extensively drug-resistant Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions

Publisher: Microbiology Society

Date: 07-2018

DOI: 10.1099/MGEN.0.000188

Publication

cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data

Publisher: Springer Science and Business Media LLC

Date: 2012

DOI: 10.1186/GB-2012-13-12-R120

Publication

Six new loci associated with body mass index highlight a neuronal influence on body weight regulation

Publisher: Springer Science and Business Media LLC

Date: 14-12-2008

DOI: 10.1038/NG.287

Publication

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 10-01-2018

DOI: 10.1101/246108

Abstract: Tandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between in iduals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations. We report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation. We used a PacBio long-read sequenced s le to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results. The novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Publication

Bio-Sense: A system for supporting sharing and exploration in bioinformatics using semantic web services

Publisher: IEEE

Date: 12-2008

DOI: 10.1109/ESCIENCE.2008.95

Publication

Modelling pathogen load dynamics to elucidate mechanistic determinants of host–Plasmodium falciparum interactions

Publisher: Springer Science and Business Media LLC

Date: 17-06-2019

DOI: 10.1038/S41564-019-0474-X

Publication

Trust in Social-Sensor Cloud Service

Publisher: IEEE

Date: 07-2018

DOI: 10.1109/ICWS.2018.00061

Publication

Genome-wide association study of sexual maturation in males and females highlights a role for body mass and menarche loci in male puberty

Publisher: Oxford University Press (OUP)

Date: 25-04-2014

DOI: 10.1093/HMG/DDU150

Publication

Long-Read RNA Sequencing Identifies Polyadenylation Elongation and Differential Transcript Usage of Host Transcripts During SARS-CoV-2 In Vitro Infection

Publisher: Frontiers Media SA

Date: 06-04-2022

DOI: 10.3389/FIMMU.2022.832223

Abstract: Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) using Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analyzed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~101 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2 , which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.

Publication

Real-time resolution of short-read assembly graph using ONT long reads

Publisher: Cold Spring Harbor Laboratory

Date: 18-02-2020

DOI: 10.1101/2020.02.17.953539

Abstract: A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph , a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at snguyen/assembly .

Publication

Demographic and motor features associated with the occurrence of neuropsychiatric and sleep complications of Parkinson's disease

Publisher: BMJ

Date: 05-03-2013

DOI: 10.1136/JNNP-2012-304440

Abstract: To determine whether four key neuropsychiatric and sleep related features associated with Parkinson's disease (PD) are associated with the motor handicap and demographic data. The growing number of recognised non-motor features of PD makes routine screening of all these symptoms impractical. Here, we investigated the hypothesis that standard demographic data and the routine assessment of motor signs is associated with the presence of dementia, psychosis, clinically probable rapid eye movement (REM) sleep behavior disorder (cpRBD) and restless legs syndrome (RLS). 775 patients with PD underwent standardised assessment of motor features and the presence of dementia, psychosis, cpRBD and RLS. A stepwise feature elimination procedure with fitted logistic regression models was applied to identify which/if any combination of demographic and motor factors is associated with each of the four studied non-motor features. A within-study out-of-s le estimate of the power of the predicted values of the models was calculated using standard evaluation procedures. Age and Hoehn&Yahr (H&Y) stage were strongly associated with the presence of dementia (p value<0.001 for both factors in the final selected model) while a combination of age, disease duration, H&Y stage, dopamine agonists and catechol-O-methyltransferase (COMT) inhibitors was associated with the presence of psychosis. Disease duration and H&Y stage were the significant indicators of cpRBD, and the lack of significant motor asymmetry was the only significant feature associated with RLS-type symptoms but the evidence of association was weak. Demographic and motor features routinely collected in patients with PD can estimate the occurrence of neuropsychiatric and sleep-related features of PD.

Publication

Signatures of TSPAN8 Variants Associated with Human Metabolic Regulation and Diseases

Publisher: Elsevier BV

Date: 2021

DOI: 10.2139/SSRN.3766495

Publication

Web Application Resource Requirements Estimation Based on the Workload Latent Features

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2021

DOI: 10.1109/TSC.2019.2918776

Publication

Correction

Publisher: Ovid Technologies (Wolters Kluwer Health)

Date: 04-2011

DOI: 10.1161/HCG.0B013E31821BBDA9

Publication

Evolution and spread of a highly drug resistant strain of Mycobacterium tuberculosis in Papua New Guinea

Publisher: Springer Science and Business Media LLC

Date: 06-05-2022

DOI: 10.1186/S12879-022-07414-2

Abstract: Molecular mechanisms determining the transmission and prevalence of drug resistant tuberculosis (DR-TB) in Papua New Guinea (PNG) are poorly understood. We used genomic and drug susceptibility data to explore the evolutionary history, temporal acquisition of resistance and transmission dynamics of DR-TB across PNG. We performed whole genome sequencing on isolates from Central Public Health Laboratory, PNG, collected 2017–2019. Data analysis was done on a composite dataset that also included 100 genomes previously sequenced from Daru, PNG (2012–2015). S led isolates represented 14 of the 22 PNG provinces, the majority (66/94 70%) came from the National Capital District (NCD). In the composite dataset, 91% of strains were Beijing 2.2.1.1, identified in 13 provinces. Phylogenetic tree of Beijing strains revealed two clades, Daru dominant clade (A) and NCD dominant clade (B). Multi-drug resistance (MDR) was repeatedly and independently acquired, with the first MDR cases in both clades noted to have emerged in the early 1990s, while fluoroquinolone resistance emerged in 2009 (95% highest posterior density 2000–2016). We identified the presence of a frameshift mutation within Rv0678 (p.Asp47fs) which has been suggested to confer resistance to bedaquiline, despite no known exposure to the drug. Overall genomic clustering was significantly associated with rpoC compensatory and inhA promoter mutations (p 0.001), with high percentage of most genomic clusters (12/14) identified in NCD, reflecting its role as a potential national lifier. The acquisition and evolution of drug resistance among the major clades of Beijing strain threaten the success of DR-TB treatment in PNG. With continued transmission of this strain in PNG, genotypic drug resistance surveillance using whole genome sequencing is essential for improved public health response to outbreaks. With occurrence of resistance to newer drugs such as bedaquiline, knowledge of full drug resistance profiles will be important for optimal treatment selection.

Publication

Genetic architecture of early childhood growth phenotypes gives insights into their link with later obesity

Publisher: Cold Spring Harbor Laboratory

Date: 16-06-2017

DOI: 10.1101/150516

Abstract: Early childhood growth patterns are associated with adult metabolic health, but the underlying mechanisms are unclear. We performed genome-wide meta-analyses and follow-up in up to 22,769 European children for six early growth phenotypes derived from longitudinal data: peak height and weight velocities, age and body mass index (BMI) at adiposity peak (AP ~ 9 months) and rebound (AR ~ 5-6 years). We identified four associated loci ( P 5x10 −8 ): LEPR/LEPROT with BMI at AP, FTO and TFAP2B with Age at AR and GNPDA2 with BMI at AR. The observed AR-associated SNPs at FTO, TFAP2B and GNPDA2 represent known adult BMI-associated variants. The common BMI at AP associated variant at LEPR/LEPROT was not associated with adult BMI but was associated with LEPROT gene expression levels, especially in subcutaneous fat ( P x10 −51 ). We identify strong positive genetic correlations between early growth and later adiposity traits, and analysis of the full discovery stage results for Age at AR revealed enrichment for insulin-like growth factor 1 (IGF-1) signaling and apolipoprotein pathways. This genome-wide association study suggests mechanistic links between early childhood growth and adiposity in later childhood and adulthood, highlighting these early growth phenotypes as potential targets for the prevention of obesity.

Publication

Complete genome sequence of Klebsiella quasipneumoniae subsp. similipneumoniae strain ATCC 700603

Publisher: American Society for Microbiology

Date: 30-06-2016

DOI: 10.1128/GENOMEA.00438-16

Abstract: Klebsiella quasipneumoniae subsp. similipneumoniae strain ATCC 700603, formerly known as K. pneumoniae K6, is known for producing extended-spectrum β-lactamase (ESBL) enzymes that can hydrolyze oxyimino-β-lactams, resulting in resistance to these drugs. We herein report the complete genome of strain ATCC 700603 and show that the ESBL genes are plasmid-encoded.

Publication

Evaluating the genome and resistome of extensively drug-resistant Klebsiella pneumoniae using native DNA and RNA Nanopore sequencing

Publisher: Oxford University Press (OUP)

Date: 02-2020

DOI: 10.1093/GIGASCIENCE/GIAA002

Abstract: Klebsiella pneumoniae frequently harbours multidrug resistance, and current diagnostics struggle to rapidly identify appropriate antibiotics to treat these bacterial infections. The MinION device can sequence native DNA and RNA in real time, providing an opportunity to compare the utility of DNA and RNA for prediction of antibiotic susceptibility. However, the effectiveness of bacterial direct RNA sequencing and base-calling has not previously been investigated. This study interrogated the genome and transcriptome of 4 extensively drug-resistant (XDR) K. pneumoniae clinical isolates however, further antimicrobial susceptibility testing identified 3 isolates as pandrug-resistant (PDR). The majority of acquired resistance (≥75%) resided on plasmids including several megaplasmids (≥100 kb). DNA sequencing detected most resistance genes (≥70%) within 2 hours of sequencing. Neural network–based base-calling of direct RNA achieved up to 86% identity rate, although ≤23% of reads could be aligned. Direct RNA sequencing (with ∼6 times slower pore translocation) was able to identify (within 10 hours) ≥35% of resistance genes, including those associated with resistance to aminoglycosides, β-lactams, trimethoprim, and sulphonamide and also quinolones, rif icin, fosfomycin, and phenicol in some isolates. Direct RNA sequencing also identified the presence of operons containing up to 3 resistance genes. Polymyxin-resistant isolates showed a heightened transcription of phoPQ (≥2-fold) and the pmrHFIJKLM operon (≥8-fold). Expression levels estimated from direct RNA sequencing displayed strong correlation (Pearson: 0.86) compared to quantitative real-time PCR across 11 resistance genes. Overall, MinION sequencing rapidly detected the XDR/PDR K. pneumoniae resistome, and direct RNA sequencing provided accurate estimation of expression levels of these genes.

Publication

Advanced Web Services

Publisher: Springer New York

Date: 2014

DOI: 10.1007/978-1-4614-7535-4

Publication

A new scoring system derived from base excess and platelet count at presentation predicts mortality in paediatric meningococcal sepsis

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/CC12609

Publication

Multifactorial Chromosomal Variants Regulate Polymyxin Resistance in Extensively Drug-Resistant Klebsiella pneumoniae

Publisher: Cold Spring Harbor Laboratory

Date: 08-05-2017

DOI: 10.1101/134684

Abstract: Extensively drug-resistant Klebsiella pneumoniae (XDR-KP) infections cause high mortality and are disseminating globally. Identifying the genetic basis underpinning resistance allows for rapid diagnosis and treatment. XDR isolates sourced from Greece and Brazil, including nineteen polymyxin-resistant and five polymyxin-susceptible strains, underwent whole genome sequencing. Approximately 90% of polymyxin resistance was enabled by alterations upstream or within mgrB . The most common mutation identified was an insertion at nucleotide position 75 in mgrB via an ISK pn26 -like element in the ST258 lineage and ISK pn13 in one ST11 isolate. Three strains acquired an IS1 element upstream of mgrB and another strain had an ISK pn25 insertion at 133 bp. Other isolates had truncations (C28STOP, Q30STOP) or a missense mutation (D31E) affecting mgrB . Complementation assays revealed all mgrB perturbations contributed to resistance. Missense mutations in phoQ (T281M, G385C) were also found to facilitate resistance. Several variants in phoPQ co-segregating with the ISKpn26-like insertion were identified as potential partial suppressor mutations. Three ST258 s les were found to contain subpopulations with different resistance conferring mutations, including the ISKpn26-like insertion colonising with a novel mutation in pmrB (P158R), both confirmed via complementation assays. We also characterized a new multi-drug resistant Klebsiella quasipneumoniae strain ST2401 which was susceptible to polymyxins. These findings highlight the broad spectrum of chromosomal modifications which can facilitate and regulate resistance against polymyxins in K. pneumoniae . Whole genome sequencing of the 24 clinical isolates has been deposited under BioProject PRJNA307517 ( ioproject/PRJNA307517 ). Klebsiella pneumoniae contributes to a high abundance of nosocomial infections and the rapid emergence of antimicrobial resistance hinders treatment. Polymyxins are predominantly utilized to treat multidrug-resistant infections, however, resistance to the polymyxins is arising. This increasing prevalence in polymyxin resistance is evident especially in Greece and Brazil. Identifying the genomic variations conferring resistance in clinical isolates from these regions assists with potentially detecting novel alterations and tracing the spread of particular strains. This study commonly found mutations in the gene mgrB , the negative regulator of PhoPQ, known to cause resistance in KP. In the remaining isolates, missense mutations in phoQ were accountable for resistance. Multiple novel mutations were detected to be segregating with mgrB perturbations. This was either due to a mixed heterogeneous s le of two polymyxin-resistant strains, or because of multiple mutations within the same strain. Of interest was the validation of novel mutations in phoPQ segregating with a previously known ISK pn26 -like element in disrupted mgrB isolates. Complementation of these phoPQ mutations revealed a reduction in minimum inhibitory concentrations and suggests the first evidence of partial suppressor mutations in KP. This research builds upon our current understanding of heteroresistance, lineage specific mutations and regulatory variations relating to polymyxin resistance.

Publication

Herbal plants- and rice straw-derived biochars reduced metal mobilization in fishpond sediments and improved their potential as fertilizers

Publisher: Elsevier BV

Date: 06-2022

DOI: 10.1016/J.SCITOTENV.2022.154043

Abstract: Fishpond sediments are rich in organic carbon and nutrients thus, they can be used as potential fertilizers and soil conditioners. However, sediments can be contaminated with toxic elements (TEs), which have to be immobilized to allow sediment reutilization. Addition of biochars (BCs) to contaminated sediments may enhance their nutrient content and stabilize TEs, which valorize its reutilization. Consequently, this study evaluated the performance of BCs derived from Taraxacum mongolicum Hand-Mazz (TMBC), Tribulus terrestris (TTBC), and rice straw (RSBC) for Cu, Cr, and Zn stabilization and for the enhancement of nutrient content in the fishpond sediments from San Jiang (SJ) and Tan Niu (TN), China. All BCs, particularly TMBC, reduced significantly the average concentrations of Cr, Cu, and Zn in the overlying water (up to 51% for Cr, 71% for Cu, and 68% for Zn) and in the sediments pore water (up to 77% for Cr, 76% for Cu, and 50% for Zn), and also reduced metal leachability (up to 47% for Cr, 60% for Cu, and 62% for Zn), as compared to the control. The acid soluble fraction accounted for the highest portion of the total content of Cr (43-44%), Cu (38-43%), and Zn (42-45%), followed by the reducible, oxidizable, and the residual fraction this indicates the high potential risk. As compared with the control, TMBC was more effective in reducing the average concentrations of the acid soluble Cr (15-22%), Cu (35-53%), and Zn (21-39%). Added BCs altered the metals acid soluble fraction by shifting it to the oxidizable and residual fractions. Moreover, TMBC improved the macronutrient status in both sediments. This work provides a pathway for TEs remediation of sediments and gives novel insights into the utilization of BC-treated fishpond sediments as fertilizers for crop production.

Publication

QoS analysis for web service compositions based on probabilistic QoS

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-25535-9_4

Publication

Drug resistance prediction for Mycobacterium tuberculosis with reference graphs

Publisher: Microbiology Society

Date: 08-08-2023

DOI: 10.1099/MGEN.0.001081

Abstract: Tuberculosis is a global pandemic disease with a rising burden of antimicrobial resistance. As a result, the World Health Organization (WHO) has a goal of enabling universal access to drug susceptibility testing (DST). Given the slowness of and infrastructure requirements for phenotypic DST, whole-genome sequencing, followed by genotype-based prediction of DST, now provides a route to achieving this. Since a central component of genotypic DST is to detect the presence of any known resistance-causing mutations, a natural approach is to use a reference graph that allows encoding of known variation. We have developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44 709 and 138 publicly available Illumina and Nanopore s les with associated phenotypes. We find that DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe with Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis – including deletion of genes katG and pncA – and suggest mutations that may warrant reclassification as associated with resistance.

Publication

Obesity-susceptibility loci have a limited influence on birth weight: a meta-analysis of up to 28,219 individuals

Publisher: Elsevier BV

Date: 04-2011

DOI: 10.3945/AJCN.110.000828

Abstract: High birth weight is associated with adult body mass index (BMI). We hypothesized that birth weight and BMI may partly share a common genetic background. The objective was to examine the associations of 12 established BMI variants in or near the NEGR1, SEC16B, TMEM18, ETV5, GNPDA2, BDNF, MTCH2, BCDIN3D, SH2B1, FTO, MC4R, and KCTD15 genes and their additive score with birth weight. A meta-analysis was conducted with the use of 1) the European Prospective Investigation into Cancer and Nutrition (EPIC)-Norfolk, Hertfordshire, Fenland, and European Youth Heart Study cohorts (n(max) = 14,060) 2) data extracted from the Early Growth Genetics Consortium meta-analysis of 6 genome-wide association studies for birth weight (n(max) = 10,623) and 3) all published data (n(max) = 14,837). Only the MTCH2 and FTO loci showed a nominally significant association with birth weight. The BMI-increasing allele of the MTCH2 variant (rs10838738) was associated with a lower birth weight (β ± SE: -13 ± 5 g/allele P = 0.012 n = 23,680), and the BMI-increasing allele of the FTO variant (rs1121980) was associated with a higher birth weight (β ± SE: 11 ± 4 g/allele P = 0.013 n = 28,219). These results were not significant after correction for multiple testing. Obesity-susceptibility loci have a small or no effect on weight at birth. Some evidence of an association was found for the MTCH2 and FTO loci, ie, lower and higher birth weight, respectively. These findings may provide new insights into the underlying mechanisms by which these loci confer an increased risk of obesity.

Publication

Nanopore sequencing as a scalable, cost-effective platform for analyzing polyclonal vector integration sites following clinical T cell therapy

Publisher: BMJ

Date: 06-2020

DOI: 10.1136/JITC-2019-000299

Abstract: Analysis of vector integration sites in gene-modified cells can provide critical information on clonality and potential biological impact on nearby genes. Current short-read next-generation sequencing methods require specialized instruments and large batch runs. We used nanopore sequencing to analyze the vector integration sites of T cells transduced by the gammaretroviral vector, SFG.iCasp9.2A.ΔCD19. DNA from oligoclonal cell lines and polyclonal clinical s les were restriction enzyme digested with two 6-cutters, NcoI and BspHI and the flanking genomic DNA lified by inverse PCR or cassette ligation PCR. Following nested PCR and barcoding, the licons were sequenced on the Oxford Nanopore platform. Reads were filtered for quality, trimmed, and aligned. Custom tool was developed to cluster reads and merge overlapping clusters. Both inverse PCR and cassette ligation PCR could successfully lify flanking genomic DNA, with cassette ligation PCR showing less bias. The 4.8 million raw reads were grouped into 12,186 clusters and 6410 clones. The 3′long terminal repeat (LTR)-genome junction could be resolved within a 5-nucleotide span for a majority of clusters and within one nucleotide span for clusters with ≥5 reads. The chromosomal distributions of the insertional sites and their predilection for regions proximate to transcription start sites were consistent with previous reports for gammaretroviral vector integrants as analyzed by short-read next-generation sequencing. Our study shows that it is feasible to use nanopore sequencing to map polyclonal vector integration sites. The assay is scalable and requires minimum capital, which together enable cost-effective and timely analysis. Further refinement is required to reduce lification bias and improve single nucleotide resolution.

Publication

A global reference for human genetic variation

Publisher: Springer Science and Business Media LLC

Date: 11-12-1994

DOI: 10.1038/NATURE15393

Publication

Drug resistance prediction forMycobacterium tuberculosiswith reference graphs

Publisher: Cold Spring Harbor Laboratory

Date: 04-05-2023

DOI: 10.1101/2023.05.04.539481

Abstract: 2. The dominant paradigm for analysing genetic variation relies on a central idea: all genomes in a species can be described as minor differences from a single reference genome. However, this approach can be problematic or inadequate for bacteria, where there can be significant sequence ergence within a species. Reference graphs are an emerging solution to the reference bias issues implicit in the “single-reference” model. Such a graph represents variation at multiple scales within a population – e.g., nucleotide- and locus-level. The genetic causes of drug resistance in bacteria have proven comparatively easy to decode compared with studies of human diseases. For ex le, it is possible to predict resistance to numerous anti-tuberculosis drugs by simply testing for the presence of a list of single nucleotide polymorphisms and insertion/deletions, commonly referred to as a catalogue. We developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph, a process that can be replicated for other species. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44,709 and 138 publicly available Illumina and Nanopore s les with associated phenotypes. We find DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe on Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis - including deletion of genes katG and pncA – and suggest mutations that may warrant reclassification as associated with resistance. 3. Mycobacterium tuberculosis is the bacterium responsible for tuberculosis (TB). TB is one of the leading causes of death worldwide before the coronavirus pandemic it was the leading cause of death from a single pathogen. Drug-resistant TB incidence has recently increased, making the detection of resistance even more vital. In this study, we develop a new software tool to predict drug resistance from whole-genome sequence data of the pathogen using new reference graph models to represent a reference genome. We evaluate it on M. tuberculosis against existing tools for resistance prediction and show improved performance. Using our method, we discover new resistance-associated variations and discuss reclassification of a selection of existing mutations. As such, this work contributes to TB drug resistance diagnostic efforts. In addition, the method could be applied to any bacterial species, so is of interest to anyone working on antimicrobial resistance. 4. The authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files . The software method presented in this work, DrPRG, is freely available from GitHub under an MIT license at bhall88/drprg . We used commit 9492f25 for all results via a Singularity[1] container from the URI docker://quay.io/mbhall88/drprg:9492f25 . All code used to generate results for this study are available on GitHub at bhall88/drprg-paper . All data used in this work are freely available from the SRA/ENA/DRA and a copy of the datasheet with all associated phenotype information can be downloaded from the archived repository at 0.5281/zenodo.7819984 or found in the previously mentioned GitHub repository. The Mycobacterium tuberculosis index used in this work is available to download through DrPRG via the command drprg index --download mtb@20230308 or from GitHub at bhall88/drprg-index .

Publication

Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations

Publisher: Springer Science and Business Media LLC

Date: 30-10-2020

DOI: 10.1038/S41588-020-00717-7

Publication

Social-Sensor Composition for Tapestry Scenes

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 03-2020

DOI: 10.1109/TSC.2020.2974741

Publication

Non-antibiotic antimicrobial triclosan induces multiple antibiotic resistance through genetic mutation

Publisher: Elsevier BV

Date: 09-2018

DOI: 10.1016/J.ENVINT.2018.06.004

Abstract: Antibiotic resistance poses a major threat to public health. Overuse and misuse of antibiotics are generally recognized as the key factors contributing to antibiotic resistance. However, whether non-antibiotic, anti-microbial (NAAM) chemicals can directly induce antibiotic resistance is unclear. We aim to investigate whether the exposure to a NAAM chemical triclosan (TCS) has an impact on inducing antibiotic resistance on Escherichia coli. Here, we report that at a concentration of 0.2 mg/L TCS induces multi-drug resistance in wild-type Escherichia coli after 30-day TCS exposure. The oxidative stress induced by TCS caused genetic mutations in genes such as fabI, frdD, marR, acrR and soxR, and subsequent up-regulation of the transcription of genes encoding beta-lactamases and multi-drug efflux pumps, together with down-regulation of genes related to membrane permeability. The findings advance our understanding of the potential role of NAAM chemicals in the dissemination of antibiotic resistance in microbes, and highlight the need for controlling biocide applications.

Publication

cnvOffSeq: detecting intergenic copy number variation using off-target exome sequencing data

Publisher: Oxford University Press (OUP)

Date: 22-08-2014

DOI: 10.1093/BIOINFORMATICS/BTU475

Abstract: Motivation: Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions. Results: cnvOffSeq was benchmarked on whole-exome sequencing s les from the 1000 Genomes Project. In a set of 104 gold standard intergenic deletions, our method achieved a sensitivity of 57.5% and a specificity of 99.2%, while maintaining a low FDR of 5%. For gold standard deletions longer than 5 kb, cnvOffSeq achieves a sensitivity of 90.4% without increasing the FDR. cnvOffSeq outperforms both whole-genome and whole-exome CNV detection methods considerably and is shown to offer a substantial improvement over naïve local SVD. Availability and Implementation: cnvOffSeq is available at /cnvoffseq/ Contact: evangelos.bellos09@imperial.ac.uk or l.coin@imb.uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases

Publisher: Public Library of Science (PLoS)

Date: 30-11-2009

DOI: 10.1371/JOURNAL.PONE.0008068

Publication

Common variants at 6q22 and 17q21 are associated with intracranial volume

Publisher: Springer Science and Business Media LLC

Date: 15-04-2012

DOI: 10.1038/NG.2245

Publication

A smart user interface for service-oriented web

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-24396-7_25

Publication

TTC12-ANKK1-DRD2 and CHRNA5-CHRNA3-CHRNB4 Influence Different Pathways Leading to Smoking Behavior from Adolescence to Mid-Adulthood

Publisher: Elsevier BV

Date: 04-2011

DOI: 10.1016/J.BIOPSYCH.2010.09.055

Publication

An efficient method to find the optimal social trust path in contextual social graphs

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-18123-3_24

Publication

Adaptive Subspace Symbolization for Content-Based Video Detection

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2010

DOI: 10.1109/TKDE.2009.171

Publication

Genetic Loci Associated With C-Reactive Protein Levels and Risk of Coronary Heart Disease

Publisher: American Medical Association (AMA)

Date: 07-2009

DOI: 10.1001/JAMA.2009.954

Publication

Long-read RNA sequencing identifies polyadenylation elongation and differential transcript usage of host transcripts during SARS-CoV-2 in vitro infection

Publisher: Cold Spring Harbor Laboratory

Date: 15-12-2021

DOI: 10.1101/2021.12.14.472725

Abstract: Better methods to interrogate host-pathogen interactions during Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) infections are imperative to help understand and prevent this disease. Here we implemented RNA-sequencing (RNA-seq) combined with the Oxford Nanopore Technologies (ONT) long-reads to measure differential host gene expression, transcript polyadenylation and isoform usage within various epithelial cell lines permissive and non-permissive for SARS-CoV-2 infection. SARS-CoV-2-infected and mock-infected Vero (African green monkey kidney epithelial cells), Calu-3 (human lung adenocarcinoma epithelial cells), Caco-2 (human colorectal adenocarcinoma epithelial cells) and A549 (human lung carcinoma epithelial cells) were analysed over time (0, 2, 24, 48 hours). Differential polyadenylation was found to occur in both infected Calu-3 and Vero cells during a late time point (48 hpi), with Gene Ontology (GO) terms such as viral transcription and translation shown to be significantly enriched in Calu-3 data. Poly(A) tails showed increased lengths in the majority of the differentially polyadenylated transcripts in Calu-3 and Vero cell lines (up to ~136 nt in mean poly(A) length, padj = 0.029). Of these genes, ribosomal protein genes such as RPS4X and RPS6 also showed downregulation in expression levels, suggesting the importance of ribosomal protein genes during infection. Furthermore, differential transcript usage was identified in Caco-2, Calu-3 and Vero cells, including transcripts of genes such as GSDMB and KPNA2 , which have previously been implicated in SARS-CoV-2 infections. Overall, these results highlight the potential role of differential polyadenylation and transcript usage in host immune response or viral manipulation of host mechanisms during infection, and therefore, showcase the value of long-read sequencing in identifying less-explored host responses to disease.

Publication

Effects of Long-Term Averaging of Quantitative Blood Pressure Traits on the Detection of Genetic Associations

Publisher: Elsevier BV

Date: 07-2014

DOI: 10.1016/J.AJHG.2014.06.002

Publication

Bioinformatics: living on the edge

Publisher: Springer Science and Business Media LLC

Date: 2012

DOI: 10.1186/GB-2012-13-10-321

Publication

Genetic Determinants of Height Growth Assessed Longitudinally from Infancy to Adulthood in the Northern Finland Birth Cohort 1966

Publisher: Public Library of Science (PLoS)

Date: 06-03-2009

DOI: 10.1371/JOURNAL.PGEN.1000409

Publication

Uncovering strain- and age-dependent differences in innate immune response to SARS-CoV-2 infection in nasal epithelia using 10X single-cell sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 06-03-2023

DOI: 10.1101/2023.03.04.531075

Abstract: Assessing the impact of SARS-CoV-2 variants on the host is crucial with continuous emergence of new variants. We employed single-cell sequencing to investigate host transcriptomic response to ancestral and Alpha-strain SARS-CoV-2 infections within air-liquid-interface human nasal epithelial cells from adults and adolescents. Strong innate immune responses were observed across lowly-infected and bystander cell-types, and heightened in Alpha-infection. Contrastingly, the innate immune response of highly-infected cells was like mock-control cells. Alpha highly-infected cells showed increased expression of protein refolding genes compared with ancestral-strain-infected adolescent cells. Oxidative phosphorylation- and translation-related genes were down-regulated in bystander cells versus infected and mock-control cells, suggesting that the down-regulation is protective and up-regulation supports viral activity. Infected adult cells revealed up-regulation of these pathways compared with infected adolescents, implying enhanced pro-viral states in infected adults. Overall, this highlights the complexity of cell-type-, age- and viral-strain-dependent host epithelial responses to SARS-CoV-2 and the value of air-liquid-interface cultures.

Publication

Formulating Cost-Effective Data Distribution Strategies Online for Edge Cache Systems

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2022

DOI: 10.1109/TPDS.2022.3185250

Publication

Real-time resolution of short-read assembly graph using ONT long reads

Publisher: Public Library of Science (PLoS)

Date: 20-01-2021

DOI: 10.1371/JOURNAL.PCBI.1008586

Abstract: A streaming assembly pipeline utilising real-time Oxford Nanopore Technology (ONT) sequencing data is important for saving sequencing resources and reducing time-to-result. A previous approach implemented in npScarf provided an efficient streaming algorithm for hybrid assembly but was relatively prone to mis-assemblies compared to other graph-based methods. Here we present npGraph , a streaming hybrid assembly tool using the assembly graph instead of the separated pre-assembly contigs. It is able to produce more complete genome assembly by resolving the path finding problem on the assembly graph using long reads as the traversing guide. Application to synthetic and real data from bacterial isolate genomes show improved accuracy while still maintaining a low computational cost. npGraph also provides a graphical user interface (GUI) which provides a real-time visualisation of the progress of assembly. The tool and source code is available at snguyen/assembly .

Publication

Guest Editorial: Special Issue on Clouds for Social Computing

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2014

DOI: 10.1109/TSC.2014.2313414

Publication

Plasma lipid profiles discriminate bacterial from viral infection in febrile children

Publisher: Springer Science and Business Media LLC

Date: 27-11-2019

DOI: 10.1038/S41598-019-53721-1

Abstract: Fever is the most common reason that children present to Emergency Departments. Clinical signs and symptoms suggestive of bacterial infection are often non-specific, and there is no definitive test for the accurate diagnosis of infection. The ‘omics’ approaches to identifying biomarkers from the host-response to bacterial infection are promising. In this study, lipidomic analysis was carried out with plasma s les obtained from febrile children with confirmed bacterial infection (n = 20) and confirmed viral infection (n = 20). We show for the first time that bacterial and viral infection produces distinct profile in the host lipidome. Some species of glycerophosphoinositol, sphingomyelin, lysophosphatidylcholine and cholesterol sulfate were higher in the confirmed virus infected group, while some species of fatty acids, glycerophosphocholine, glycerophosphoserine, lactosylceramide and bilirubin were lower in the confirmed virus infected group when compared with confirmed bacterial infected group. A combination of three lipids achieved an area under the receiver operating characteristic (ROC) curve of 0.911 (95% CI 0.81 to 0.98). This pilot study demonstrates the potential of metabolic biomarkers to assist clinicians in distinguishing bacterial from viral infection in febrile children, to facilitate effective clinical management and to the limit inappropriate use of antibiotics.

Publication

Scaffolding and Completing Genome Assemblies in Real-time with Nanopore Sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 22-05-2016

DOI: 10.1101/054783

Abstract: Genome assemblies obtained from short read sequencing technologies are often fragmented into many contigs because of the abundance of repetitive sequences. Long read sequencing technologies allow the generation of reads spanning most repeat sequences, providing the opportunity to complete these genome assemblies. However, substantial amounts of sequence data and computational resources are required to overcome the high per-base error rate inherent to these technologies. Furthermore, most existing methods only assemble the genomes after sequencing has completed which could result in either generation of more sequence data at greater cost than required or a low-quality assembly if insufficient data are generated. Here we present the first computational method which utilises real-time nanopore sequencing to scaffold and complete short-read assemblies while the long read sequence data is being generated. The method reports the progress of completing the assembly in real-time so users can terminate the sequencing once an assembly of sufficient quality and completeness is obtained. We use our method to complete four bacterial genomes and one eukaryotic genome, and show that it is able to construct more complete and more accurate assemblies, and at the same time, requires less sequencing data and computational resources than existing pipelines. We also demonstrate that the method can facilitate real-time analyses of positional information such as identification of bacterial genes encoded in plasmids and pathogenicity islands.

Publication

Stance and credibility based trust in social-sensor cloud services

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-02925-8_13

Publication

In-sewer decay and partitioning of Campylobacter jejuni and Campylobacter coli and implications for their wastewater surveillance

Publisher: Elsevier BV

Date: 04-2023

DOI: 10.1016/J.WATRES.2023.119737

Publication

Efficient agglomerative hierarchical clustering

Publisher: Elsevier BV

Date: 04-2015

DOI: 10.1016/J.ESWA.2014.09.054

Publication

Quantitative trait loci and differential gene expression analyses reveal the genetic basis for negatively associated β-carotene and starch content in hexaploid sweetpotato [Ipomoea batatas (L.) Lam.]

Publisher: Springer Science and Business Media LLC

Date: 08-10-2019

DOI: 10.1007/S00122-019-03437-7

Abstract: β-Carotene content in sweetpotato is associated with the Orange and phytoene synthase genes due to physical linkage of phytoene synthase with sucrose synthase , β-carotene and starch content are negatively correlated. In populations depending on sweetpotato for food security, starch is an important source of calories, while β-carotene is an important source of provitamin A. The negative association between the two traits contributes to the low nutritional quality of sweetpotato consumed, especially in sub-Saharan Africa. Using a biparental mapping population of 315 F 1 progeny generated from a cross between an orange-fleshed and a non-orange-fleshed sweetpotato variety, we identified two major quantitative trait loci (QTL) on linkage group (LG) three (LG3) and twelve (LG12) affecting starch, β-carotene, and their correlated traits, dry matter and flesh color. Analysis of parental haplotypes indicated that these two regions acted pleiotropically to reduce starch content and increase β-carotene in genotypes carrying the orange-fleshed parental haplotype at the LG3 locus. Phytoene synthase and sucrose synthase, the rate-limiting and linked genes located within the QTL on LG3 involved in the carotenoid and starch biosynthesis, respectively, were differentially expressed in Beauregard versus Tanzania storage roots. The Orange gene, the molecular switch for chromoplast biogenesis, located within the QTL on LG12 while not differentially expressed was expressed in developing roots of the parental genotypes. We conclude that these two QTL regions act together in a cis and trans manner to inhibit starch biosynthesis in amyloplasts and enhance chromoplast biogenesis, carotenoid biosynthesis, and accumulation in orange-fleshed sweetpotato. Understanding the genetic basis of this negative association between starch and β-carotene will inform future sweetpotato breeding strategies targeting sweetpotato for food and nutritional security.

Publication

Context-sensitive user interfaces for semantic services

Publisher: Association for Computing Machinery (ACM)

Date: 2012

DOI: 10.1145/2078316.2078322

Abstract: Service-centric solutions usually require rich context to fully deliver and better reflect on the underlying applications. We present a novel use of context in the form of customized user interface services with the concept of User Interface as a Service (UIaaS). UIaaS takes user profiles as input to generate context-aware interface services. Such interface services can be used as context to augment semantic services with contextual information leading to UIaaS as a Context (UIaaSaaC). The added serendipitous benefit of the proposed concept is that the composition of a customized user interface with the requested service is performed by the service composition engine, as is the case with any other services. We use a special-purpose language (called User Interface Description Language (UIDL)) to model and realize user interfaces as services. We use a real-life e-government application, human services delivery for the citizens, as a proof-of-concept. We also present a comprehensive evaluation of the proposed approach using a functional evaluation and a nonfunctional evaluation consisting of an end user usability test and expert usability reviews.

Publication

Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outcrossed mapping populations

Publisher: Cold Spring Harbor Laboratory

Date: 22-03-2017

DOI: 10.1101/119271

Abstract: The assembly of whole-chromosome pseudomolecules for plant genomes remains challenging due to polyploidy and high repeat content. We developed an approach for constructing complete pseudomolecules for polyploid species using genotyping-by-sequencing data from outcrossing mapping populations coupled with high coverage whole genome sequence data of a reference genome. Our approach combines de novo assembly with linkage mapping to arrange scaffolds into pseudomolecules. We show that the method is able to reconstruct simulated chromosomes for both diploid and tetraploid genomes. Comparisons to three existing genetic mapping tools show that our method outperforms the other methods in accuracy on both grouping and ordering, and is robust to the presence of substantial amounts of missing data and genotyping errors. We applied our method to three real datasets including a diploid Ipomoea trifida and two tetraploid potato mapping populations. The linkage maps show significant concordance with the reference chromosomes. We resolved seven assembly errors for the published Ipomoea trifida genome assembly as well as anchored an unplaced scaffold in the published potato genome.

Publication

Integrated pathogen load and dual transcriptome analysis of systemic host-pathogen interactions in severe malaria

Publisher: Cold Spring Harbor Laboratory

Date: 25-09-2017

DOI: 10.1101/193631

Abstract: The pathogenesis of severe Plasmodium falciparum malaria is incompletely understood. Since the pathogenic stage of the parasite is restricted to blood, dual RNA-sequencing of host and parasite transcripts in blood can reveal their interactions at a systemic scale. Here we identify human and parasite gene expression associated with severe disease features in Gambian children. Differences in parasite load explained up to 99% of differential expression of human genes but only a third of the differential expression of parasite genes. Co-expression analyses showed a remarkable co-regulation of host and parasite genes controlling translation, and host granulopoiesis genes uniquely co-regulated and differentially expressed in severe malaria. Our results indicate that high parasite load is the proximal stimulus for severe P. falciparum malaria, that there is an unappreciated role for many parasite genes in determining virulence, and hint at a molecular arms-race between host and parasite to synthesise protein products.

Publication

A new highly penetrant form of obesity due to deletions on chromosome 16p11.2

Publisher: Springer Science and Business Media LLC

Date: 02-2010

DOI: 10.1038/NATURE08727

Publication

Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences

Publisher: Springer Science and Business Media LLC

Date: 25-04-2016

DOI: 10.1038/NG.3559

Publication

Identification of a Major QTL-Controlling Resistance to the Subtropical Race 4 of Fusarium oxysporum f. sp. cubense in Musa acuminata ssp. malaccensis

Publisher: MDPI AG

Date: 09-02-2023

DOI: 10.3390/PATHOGENS12020289

Abstract: Vascular wilt caused by the ascomycete fungal pathogen Fusarium oxysporum f. sp. cubense (Foc) is a major constraint of banana production around the world. The virulent race, namely Tropical Race 4, can infect all Cavendish-type banana plants and is now widespread across the globe, causing devastating losses to global banana production. In this study, we characterized Foc Subtropical Race 4 (STR4) resistance in a wild banana relative which, through estimated genome size and ancestry analysis, was confirmed to be Musa acuminata ssp. malaccensis. Using a self-derived F2 population segregating for STR4 resistance, quantitative trait loci sequencing (QTL-seq) was performed on bulks consisting of resistant and susceptible in iduals. Changes in SNP index between the bulks revealed a major QTL located on the distal end of the long arm of chromosome 3. Multiple resistance genes are present in this region. Identification of chromosome regions conferring resistance to Foc can facilitate marker assisted selection in breeding programs and paves the way towards identifying genes underpinning resistance.

Publication

Genomic neighbor typing for bacterial outbreak surveillance

Publisher: Cold Spring Harbor Laboratory

Date: 06-02-2022

DOI: 10.1101/2022.02.05.479210

Abstract: Genomic neighbor typing enables heuristic inference of bacterial lineages and phenotypes from nanopore sequencing data. However, small reference databases may not be sufficiently representative of the ersity of lineages and genotypes present in a collection of isolates. In this study, we explore the use of genomic neighbor typing for surveillance of community-associated Staphylococcus aureus outbreaks in Papua New Guinea (PNG) and Far North Queensland, Australia (FNQ). We developed Sketchy , an implementation of genomic neighbor typing that queries exhaustive whole genome reference databases using MinHash. Evaluations were conducted using nanopore read simulations and six species-wide reference sketches (4832 - 47616 genomes), as well as two S. aureus outbreak data sets sequenced at low depth using a sequential multiplex library protocol on the MinION (n = 160, with matching Illumina data). Heuristic inference of lineages and antimicrobial resistance profiles allowed us to conduct multiplex genotyping in situ at the Papua New Guinea Institute of Medical Research in Goroka, on low-throughput Flongle adapters and using multiple successive libraries on the same MinION flow cell (n = 24 - 48). Comparison to phylogenetically informed genomic neighbor typing with RASE on the dominant outbreak sequence type suggests slightly better performance at predicting lineage-scale genotypes using large sketch sizes, but inferior performance in resolving clade-specific genotypes (methicillin resistance). Sketchy can be used for large-scale bacterial outbreak surveillance and in challenging sequencing scenarios, but improvements to clade-specific genotype inference are needed for diagnostic applications. Sketchy is available open-source at: steinig/sketchy

Publication

Probabilistic qualitative preference matching in long-term IaaS composition

Publisher: Springer International Publishing

Date: 2017

DOI: 10.1007/978-3-319-69035-3_18

Publication

QoS analysis for web service compositions with complex structures

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2013

DOI: 10.1109/TSC.2012.7

Publication

Genome-wide association study identifies eight loci associated with blood pressure

Publisher: Springer Science and Business Media LLC

Date: 10-05-2009

DOI: 10.1038/NG.361

Publication

You Are What You Use: Usage-based Profiling in IoT Environments

Publisher: ACM

Date: 11-09-2022

DOI: 10.1145/3544793.3560360

Publication

HLA-C variants associated with amino acid substitutions in the peptide binding groove influence susceptibility to Kawasaki disease

Publisher: Elsevier BV

Date: 09-2019

DOI: 10.1016/J.HUMIMM.2019.04.020

Abstract: Kawasaki disease (KD) is a pediatric vasculitis caused by an unknown trigger in genetically susceptible children. The incidence varies widely across genetically erse populations. Several associations with HLA Class I alleles have been reported in single cohort studies. Using a genetic approach, from the nine single nucleotide variants (SNVs) associated with KD susceptibility in children of European descent, we identified SNVs near the HLA-C (rs6906846) and HLA-B genes (rs2254556) whose association was replicated in a Japanese descent cohort (rs6906846 p = 0.01, rs2254556 p = 0.005). The risk allele (A at rs6906846) was also associated with HLA-C*07:02 and HLA-C*04:01 in both US multi-ethnic and Japanese cohorts and HLA-C*12:02 only in the Japanese cohort. The risk A-allele was associated with eight non-conservative amino acid substitutions (amino acid positions) Asp or Ser (9), Arg (14), Ala (49), Ala (73), Ala (90), Arg (97), Phe or Ser (99), and Phe or Ser (116) in the HLA-C peptide binding groove that binds peptides for presentation to cytotoxic T cells (CTL). This raises the possibility of increased affinity to a "KD peptide" that contributes to the vasculitis of KD in genetically susceptible children.

Publication

Assessment of the 2021 WHO Mycobacterium tuberculosis drug resistance mutation catalogue on an independent dataset

Publisher: Elsevier BV

Date: 09-2022

DOI: 10.1016/S2666-5247(22)00151-3

Publication

Impact of catalytic hydrothermal treatment and Ca/Al-modified hydrochar on lability, sorption, and speciation of phosphorus in swine manure: Microscopic and spectroscopic investigations

Publisher: Elsevier BV

Date: 04-2022

DOI: 10.1016/J.ENVPOL.2022.118877

Abstract: The effects of catalytic hydrothermal (HT) pretreatment on animal manure followed by the addition of hydrochar on the nutrients recovery have not yet been investigated using a combination of chemical, microscopic, and spectroscopic techniques. Therefore, a catalytic HT process was employed to pretreat swine manure without additives (manure-HT) and with H

Publication

Automatically Building Service-Based Systems With Function Relaxation

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2023

DOI: 10.1109/TCYB.2022.3164767

Publication

Analysis of polyclonal vector integration sites using Nanopore sequencing as a scalable, cost-effective platform

Publisher: Cold Spring Harbor Laboratory

Date: 07-11-2019

DOI: 10.1101/833897

Abstract: Vector integration site analysis can be important in the follow-up of patients who received gene-modified cells, but current platforms based on next-generation sequencing are expensive and relatively inaccessible. We analyzed polyclonal T cells transduced by a gammaretroviral vector, SFG.iCasp9.2A.ΔCD19, from a clinical trial. Following restriction enzyme digestion, the unknown flanking genomic sequences were lified by inverse polymerase chain reaction (PCR) or cassette ligation PCR. Nanopore sequencing could identify thousands of unique integration sites within polyclonal s les, with cassette ligation PCR showing less bias. The assay is scalable and requires minimum capital, which together enable cost-effective and timely analysis.

Publication

Evolutionary analysis of chromosome end extension

Publisher: PeerJ

Date: 06-03-2018

DOI: 10.7287/PEERJ.PREPRINTS.26624V1

Abstract: There are substantial subtelomeric interstitial telomeric sequence (ITS) in the human genome, however the origin of these sequences is not well understood. We investigate the possibility that these ITS have arisen via a process of chromosome end extension to the telomere sequence. By analysing the relationship between subtelomeric duplication and ITS, we identify multiple ITS which were the ancestral chromosome telomeric capping sequence. Comparison of chromosome terminal sequence between 15 species reveals an ongoing evolutionary process of chromosome extension, with an average extension rate of 0.0020 bp per year per chromosome. Analysis of SNP data from 1000 genomes demonstrates reduced SNP ersity in subtelomeric regions, indicating that many terminal regions are younger than the remaining autosomal sequence.

Publication

cnvCapSeq: detecting copy number variation in long-range targeted resequencing data

Publisher: Oxford University Press (OUP)

Date: 16-09-2014

DOI: 10.1093/NAR/GKU849

Publication

Plasma protein biomarkers distinguish Multisystem Inflammatory Syndrome in Children (MIS-C) from other pediatric infectious and inflammatory diseases

Publisher: Cold Spring Harbor Laboratory

Date: 31-07-2023

DOI: 10.1101/2023.07.28.23293197

Abstract: Multisystem inflammatory syndrome in children (MIS-C) is a rare but serious hyperinflammatory complication following infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The mechanisms underpinning the pathophysiology of MIS-C are poorly understood. Moreover, clinically distinguishing MIS-C from other childhood infectious and inflammatory conditions, such as Kawasaki Disease (KD) or severe bacterial and viral infections is challenging due to overlapping clinical and laboratory features. We aimed to determine a set of plasma protein biomarkers that could discriminate MIS-C from those other diseases. Seven candidate protein biomarkers for MIS-C were selected based on literature and from whole blood RNA-Sequencing data from patients with MIS-C and other diseases. Plasma concentrations of ARG1, CCL20, CD163, CORIN, CXCL9, PCSK9 and ADAMTS2 were quantified in MIS-C (n=22), KD (n=23), definite bacterial (DB n=28) and viral (DV, n=27) disease, and healthy controls (n=8). Logistic regression models were used to determine the discriminatory ability of in idual proteins and protein combinations to identify MIS-C, and association with severity of illness. Plasma levels of CD163, CXCL9, and PCSK9 were significantly elevated in MIS-C with a combined AUC of 86% (95% CI: 76.8%-95.1%) for discriminating MIS-C from other childhood diseases. Lower ARG1 and CORIN plasma levels were significantly associated with severe MIS-C cases requiring oxygen, inotropes or with shock. Our findings demonstrate the feasibility of a host protein biomarker signature for MIS-C and may provide new insight into its pathophysiology.

Publication

Pathway-driven gene stability selection of two rheumatoid arthritis GWAS identifies and validates new susceptibility genes in receptor mediated signalling pathways

Publisher: Oxford University Press (OUP)

Date: 08-06-2011

DOI: 10.1093/HMG/DDR248

Abstract: Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting ∼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of ∼5000 and ∼2000 subjects, respectively. We examined 700 pathways comprising ∼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked ∼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.

Publication

End-to-end service support for mashups

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2010

DOI: 10.1109/TSC.2010.34

Publication

Genetic Variation in the SLC8A1 Calcium Signaling Pathway Is Associated With Susceptibility to Kawasaki Disease and Coronary Artery Abnormalities

Publisher: Ovid Technologies (Wolters Kluwer Health)

Date: 12-2016

DOI: 10.1161/CIRCGENETICS.116.001533

Abstract: Kawasaki disease (KD) is an acute pediatric vasculitis in which host genetics influence both susceptibility to KD and the formation of coronary artery aneurysms. Variants discovered by genome-wide association studies and linkage studies only partially explain the influence of genetics on KD susceptibility. To search for additional functional genetic variation, we performed pathway and gene stability analysis on a genome-wide association study data set. Pathway analysis using European genome-wide association study data identified 100 significantly associated pathways ( P ×10 − 4 ). Gene stability selection identified 116 single nucleotide polymorphisms in 26 genes that were responsible for driving the pathway associations, and gene ontology analysis demonstrated enrichment for calcium transport ( P =1.05×10 − 4 ). Three single nucleotide polymorphisms in solute carrier family 8, member 1 ( SLC8A1 ), a sodium/calcium exchanger encoding NCX1, were validated in an independent Japanese genome-wide association study data set (meta-analysis P =0.0001). Patients homozygous for the A (risk) allele of rs13017968 had higher rates of coronary artery abnormalities ( P =0.029). NCX1, the protein encoded by SLC8A1 , was expressed in spindle-shaped and inflammatory cells in the aneurysm wall. Increased intracellular calcium mobilization was observed in B cell lines from healthy controls carrying the risk allele. Pathway-based association analysis followed by gene stability selection proved to be a valuable tool for identifying risk alleles in a rare disease with complex genetics. The role of SLC8A1 polymorphisms in altering calcium flux in cells that mediate coronary artery damage in KD suggests that this pathway may be a therapeutic target and supports the study of calcineurin inhibitors in acute KD.

Publication

Trusting the Social Web: issues and challenges

Publisher: Springer Science and Business Media LLC

Date: 10-08-2013

DOI: 10.1007/S11280-013-0252-2

Publication

Multi-clonal evolution of MDR/XDR M. tuberculosis in a high prevalence setting in Papua New Guinea over three decades

Publisher: Cold Spring Harbor Laboratory

Date: 04-08-2017

DOI: 10.1101/172601

Abstract: An outbreak of multi-drug resistant tuberculosis has been reported on Daru Island, Papua New Guinea. The Mycobacterium tuberculosis strains driving this outbreak and the temporal accrual of drug resistance mutations have not been described. We analyzed 100 isolates using whole genome sequencing and found 95 belonged to a single modern Beijing strain cluster. Molecular dating suggested acquisition of streptomycin and isoniazid resistance in the 1960s, with virulence potentially enhanced by a mycP1 mutation. The outbreak cluster demonstrated a high degree of co-resistance between isoniazid and ethionamide (80/95 84.2%) attributed to an inhA promoter mutation combined with inhA and ndh coding mutations. Multidrug resistance (MDR), observed in 78/95 s les, emerged with the acquisition of a typical rpoB mutation together with a compensatory rpoC mutation in the 1980s. There was independent acquisition of fluoroquinolone and aminoglycoside resistance with evidence of local transmission of extensively-drug resistant (XDR) strains from 2009. These findings underscore the importance of whole-genome sequencing in informing an effective public health response to MDR/XDR M. tuberculosis.

Publication

Genome-Wide Association Study Reveals Multiple Loci Associated with Primary Tooth Development during Infancy

Publisher: Public Library of Science (PLoS)

Date: 26-02-2010

DOI: 10.1371/JOURNAL.PGEN.1000856

Publication

Improved techniques for the identification of pseudogenes

Publisher: Oxford University Press (OUP)

Date: 04-08-2004

DOI: 10.1093/BIOINFORMATICS/BTH942

Abstract: Motivation: Pseudogenes are the remnants of genomic sequences of genes which are no longer functional. They are frequent in most eukaryotic genomes, and an important resource for comparative genomics. However, pseudogenes are often mis-annotated as functional genes in sequence databases. Current methods for identifying pseudogenes include methods which rely on the presence of stop codons and frameshifts, as well as methods based on the ratio of non-silent to silent nucleotide substitution rates (dN/dS). A recent survey concluded that 50% of human pseudogenes have no detectable truncation in their pseudo-coding regions, indicating that the former methods lack sensitivity. The latter methods have been used to find sets of genes enriched for pseudogenes, but are not specific enough to accurately separate pseudogenes from expressed genes. Results: We introduce a program called pseudogene inference from loss of constraint (PSILC) which incorporates novel methods for separating pseudogenes from functional genes. The methods calculate the log-odds score that evolution along the final branch of the gene tree to the query gene has been according to the following constraints: A neutral nucleotide model compared to a Pfam domain encoding model (PSILCnuc/dom) A protein coding model compared to a Pfam domain encoding model (PSILCprot/dom). Using the manual annotation of human chromosome 6, we show that both these methods result in a more accurate classification of pseudogenes than dN/dS when a Pfam domain alignment is available. Availability: PSILC is available from www.sanger.ac.uk/Software/PSILC

Publication

EdgeDis: Enabling Fast, Economical, and Reliable Data Dissemination for Mobile Edge Computing

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TSC.2023.3328991

Publication

Data-driven estimation of COVID-19 community prevalence through wastewater-based epidemiology

Publisher: Elsevier BV

Date: 10-2021

DOI: 10.1016/J.SCITOTENV.2021.147947

Publication

Web Services Foundations

Publisher: Springer New York

Date: 2014

DOI: 10.1007/978-1-4614-7518-7

Publication

Estimating parasite load dynamics to reveal novel resistance mechanisms to human malaria

Publisher: Cold Spring Harbor Laboratory

Date: 15-05-2018

DOI: 10.1101/321463

Abstract: Improved methods are needed to identify host mechanisms which directly protect against human infectious diseases in order to develop better vaccines and therapeutics 1,2 . Pathogen load determines the outcome of many infections 3 , and is a consequence of pathogen multiplication rate, duration of the infection, and inhibition or killing of pathogen by the host (resistance). If these determinants of pathogen load could be quantified then their mechanistic correlates might be determined. In humans the timing of infection is rarely known and treatment cannot usually be withheld to monitor serial changes in pathogen load and host response. Here we present an approach to overcome this and identify potential mechanisms of resistance which control parasite load in Plasmodium falciparum malaria. Using a mathematical model of longitudinal infection dynamics for orientation, we made in idualized estimates of parasite multiplication and growth inhibition in Gambian children at presentation with acute malaria and used whole blood RNA-sequencing to identify their correlates. We identified novel roles for secreted proteases cathepsin G and matrix metallopeptidase 9 (MMP9) as direct effector molecules which inhibit P. falciparum growth. Cathepsin G acts on the erythrocyte membrane, cleaving surface receptors required for parasite invasion, whilst MMP9 acts on the parasite. In contrast, the type 1 interferon response and expression of CXCL10 (IFN-γ-inducible protein of 10 kDa, IP-10) were detrimental to control of parasite growth. Natural variation in iron status and plasma levels of complement factor H were determinants of parasite multiplication rate. Our findings demonstrate the importance of accounting for the dynamic interaction between host and pathogen when seeking to identify correlates of protection, and reveal novel mechanisms controlling parasite growth in humans. This approach could be extended to identify additional mechanistic correlates of natural- and vaccine-induced immunity to malaria and other infections.

Publication

Whole genome deep sequencing analysis of cell-free DNA in samples with low tumour content

Publisher: Springer Science and Business Media LLC

Date: 20-01-2022

DOI: 10.1186/S12885-021-09160-1

Abstract: Circulating cell-free DNA (cfDNA) in the plasma of cancer patients contains cell-free tumour DNA (ctDNA) derived from tumour cells and it has been widely recognized as a non-invasive source of tumour DNA for diagnosis and prognosis of cancer. Molecular profiling of ctDNA is often performed using targeted sequencing or low-coverage whole genome sequencing (WGS) to identify tumour specific somatic mutations or somatic copy number aberrations (sCNAs). However, these approaches cannot efficiently detect all tumour-derived genomic changes in ctDNA. We performed WGS analysis of cfDNA from 4 breast cancer patients and 2 patients with benign tumours. We sequenced matched germline DNA for all 6 patients and tumour s les from the breast cancer patients. All s les were sequenced on Illumina HiSeqXTen sequencing platform and achieved approximately 30x, 60x and 100x coverage on germline, tumour and plasma DNA s les, respectively. The mutational burden of the plasma s les (1.44 somatic mutations/Mb of genome) was higher than the matched tumour s les. However, 90% of high confidence somatic cfDNA variants were not detected in matched tumour s les and were found to comprise two background plasma mutational signatures. In contrast, cfDNA from the di-nucleosome fraction (300 bp–350 bp) had much higher proportion (30%) of variants shared with tumour. Despite high coverage sequencing we were unable to detect sCNAs in plasma s les. Deep sequencing analysis of plasma s les revealed higher fraction of unique somatic mutations in plasma s les, which were not detected in matched tumour s les. Sequencing of di-nucleosome bound cfDNA fragments may increase recovery of tumour mutations from plasma.

Publication

Managing web services: An application in bioinformatics

Publisher: Springer Berlin Heidelberg

Date: 2010

DOI: 10.1007/978-3-642-17358-5_64

Publication

Phylodynamic modelling of bacterial outbreaks using nanopore sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 05-2021

DOI: 10.1101/2021.04.30.442218

Abstract: Nanopore sequencing and phylodynamic modelling have been used to reconstruct the transmission dynamics of viral epidemics, but their application to bacterial pathogens has remained challenging. Here, we implement Random Forest models for single nucleotide polymorphism (SNP) polishing to estimate ergence and effective reproduction numbers (R e ) of two community-associated, methicillin-resistant Staphylococcus aureus (MRSA) outbreaks in remote Far North Queensland and Papua New Guinea (n = 159). Successive bar-coded panels of S. aureus isolates (2 × 12 per MinION) sequenced at low-coverage ( 5x - 10x) provided sufficient data to accurately infer assembly genotypes with high recall when compared with Illumina references. De novo SNP calling with Clair was followed by SNP polishing using intra- and inter-species models trained on Snippy reference calls. Models achieved sufficient resolution on ST93 outbreak sequence types ( 70 - 90% accuracy and precision) for phylodynamic modelling from lineage-wide hybrid alignments and birth-death skyline models in BEAST2 . Our method reproduced phylogenetic topology, geographical source of the outbreaks, and indications of sustained transmission (R e 1). We provide Nextflow pipelines that implement SNP polisher training, evaluation, and outbreak alignments, enabling reconstruction of within-lineage transmission dynamics for infection control of bacterial disease outbreaks using nanopore sequencing.

Publication

Composing Energy Services in a Crowdsourced IoT Environment

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2022

DOI: 10.1109/TSC.2020.2980258

Publication

Ongoing human chromosome end extension revealed by analysis of BioNano and nanopore data

Publisher: Springer Science and Business Media LLC

Date: 09-11-2018

DOI: 10.1038/S41598-018-34774-0

Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. At least 11 BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence and extensions. These extensions are heritable and in some cases ergent between Asian and European s les. Six out of nine predicted extension sequences from NA12878 can be confirmed and filled by nanopore data. We identify two multi-kilobase sequence families both enriched more than 100-fold in extension sequence (p-values 1e-5) whose origins can be traced to interstitial sequence on ancestral primate chromosome 7. Extensive sub-telomeric duplication of these families has occurred in the human lineage subsequent to ergence from chimpanzees.

Publication

Retooling phage display with electrohydrodynamic nanomixing and nanopore sequencing

Publisher: Royal Society of Chemistry (RSC)

Date: 2019

DOI: 10.1039/C9LC00978G

Abstract: High throughput screening of phage display libraries for target binding molecules using electrohydrodynamic nanomixing and nanopore sequencing.

Publication

Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

Publisher: Springer Science and Business Media LLC

Date: 2011

DOI: 10.1186/GB-2011-12-2-R13

Publication

cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs

Publisher: Springer Science and Business Media LLC

Date: 30-05-2010

DOI: 10.1038/NMETH.1466

Abstract: Although genome-wide association studies have uncovered single-nucleotide polymorphisms (SNPs) associated with complex disease, these variants account for a small portion of heritability. Some contribution to this 'missing heritability' may come from copy-number variants (CNVs), in particular rare CNVs but assessment of this contribution remains challenging because of the difficulty in accurately genotyping CNVs, particularly small variants. We report a population-based approach for the identification of CNVs that integrates data from multiple s les and platforms. Our algorithm, cnvHap, jointly learns a chromosome-wide haplotype model of CNVs and cluster-based models of allele intensity at each probe. Using data for 50 French in iduals assayed on four separate platforms, we found that cnvHap correctly detected at least 14% more deleted and 50% more lified genotypes than PennCNV or QuantiSNP, with an 82% and 115% improvement for aberrations containing <10 probes. Combining data from multiple platforms additionally improved sensitivity.

Publication

Spatio-temporal composition of Sensor Cloud services

Publisher: IEEE

Date: 06-2014

DOI: 10.1109/ICWS.2014.44

Publication

Crowdsourcing Energy as a Service

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-03596-9_24

Publication

Sequential Learning-based IaaS Composition

Publisher: Association for Computing Machinery (ACM)

Date: 14-07-2021

DOI: 10.1145/3452332

Abstract: We propose a novel Infrastructure-as-a-Service composition framework that selects an optimal set of consumer requests according to the provider’s qualitative preferences on long-term service provisions. Decision variables are included in the temporal conditional preference networks to represent qualitative preferences for both short-term and long-term consumers. The global preference ranking of a set of requests is computed using a k -d tree indexing-based temporal similarity measure approach. We propose an extended three-dimensional Q-learning approach to maximize the global preference ranking. We design the on-policy-based sequential selection learning approach that applies the length of request to accept or reject requests in a composition. The proposed on-policy-based learning method reuses historical experiences or policies of sequential optimization using an agglomerative clustering approach. Experimental results prove the feasibility of the proposed framework.

Publication

Transcriptomic Studies of Malaria: a Paradigm for Investigation of Systemic Host-Pathogen Interactions

Publisher: American Society for Microbiology

Date: 06-2018

DOI: 10.1128/MMBR.00071-17

Abstract: Transcriptomics, the analysis of genome-wide RNA expression, is a common approach to investigate host and pathogen processes in infectious diseases. Technical and bioinformatic advances have permitted increasingly thorough analyses of the association of RNA expression with fundamental biology, immunity, pathogenesis, diagnosis, and prognosis. Transcriptomic approaches can now be used to realize a previously unattainable goal, the simultaneous study of RNA expression in host and pathogen, in order to better understand their interactions. This exciting prospect is not without challenges, especially as focus moves from interactions in vitro under tightly controlled conditions to tissue- and systems-level interactions in animal models and natural and experimental infections in humans. Here we review the contribution of transcriptomic studies to the understanding of malaria, a parasitic disease which has exerted a major influence on human evolution and continues to cause a huge global burden of disease. We consider malaria a paradigm for the transcriptomic assessment of systemic host-pathogen interactions in humans, because much of the direct host-pathogen interaction occurs within the blood, a readily s led compartment of the body. We illustrate lessons learned from transcriptomic studies of malaria and how these lessons may guide studies of host-pathogen interactions in other infectious diseases. We propose that the potential of transcriptomic studies to improve the understanding of malaria as a disease remains partly untapped because of limitations in study design rather than as a consequence of technological constraints. Further advances will require the integration of transcriptomic data with analytical approaches from other scientific disciplines, including epidemiology and mathematical modeling.

Publication

Social-sensor composition for scene analysis

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-03596-9_25

Publication

Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations

Publisher: Springer Science and Business Media LLC

Date: 18-01-2009

DOI: 10.1038/NG.301

Abstract: We analyzed genome-wide association data from 1,380 Europeans with early-onset and morbid adult obesity and 1,416 age-matched normal-weight controls. Thirty-eight markers showing strong association were further evaluated in 14,186 European subjects. In addition to FTO and MC4R, we detected significant association of obesity with three new risk loci in NPC1 (endosomal/lysosomal Niemann-Pick C1 gene, P = 2.9 x 10(-7)), near MAF (encoding the transcription factor c-MAF, P = 3.8 x 10(-13)) and near PTER (phosphotriesterase-related gene, P = 2.1 x 10(-7)).

Publication

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

Publisher: Springer Science and Business Media LLC

Date: 16-07-2018

DOI: 10.1186/S12859-018-2282-3

Publication

Diagnosis of Childhood Tuberculosis and Host RNA Expression in Africa

Publisher: Massachusetts Medical Society

Date: 05-2014

DOI: 10.1056/NEJMOA1303657

Publication

Reputation Bootstrapping for Composite Services Using CP-Nets

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2022

DOI: 10.1109/TSC.2021.3084928

Publication

Cross-Border Movement of Highly Drug-Resistant Mycobacterium tuberculosis from Papua New Guinea to Australia through Torres Strait Protected Zone, 2010–2015

Publisher: Centers for Disease Control and Prevention (CDC)

Date: 03-2019

DOI: 10.3201/EID2503.181003

Publication

Enhanced protein domain discovery using taxonomy

Publisher: Springer Science and Business Media LLC

Date: 2004

DOI: 10.1186/1471-2105-5-56

Publication

An Efficient Near-Duplicate Video Shot Detection Method Using Shot-Based Interest Points

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2009

DOI: 10.1109/TMM.2009.2021794

Publication

Genomic epidemiology of tuberculosis in eastern Malaysia: insights for strengthening public health responses

Publisher: Microbiology Society

Date: 04-05-2021

DOI: 10.1099/MGEN.0.000573

Abstract: Tuberculosis is a leading public health priority in eastern Malaysia. Knowledge of the genomic epidemiology of tuberculosis can help tailor public health interventions. Our aims were to determine tuberculosis genomic epidemiology and characterize resistance mutations in the ethnically erse city of Kota Kinabalu, Sabah, located at the nexus of Malaysia, Indonesia, Philippines and Brunei. We used an archive of prospectively collected Mycobacterium tuberculosis s les paired with epidemiological data. We collected sputum and demographic data from consecutive consenting outpatients with pulmonary tuberculosis at the largest tuberculosis clinic from 2012 to 2014, and selected s les from tuberculosis inpatients from the tertiary referral centre during 2012–2014 and 2016–2017. Two hundred and eight M . tuberculosis sequences were available for analysis, representing 8 % of cases notified during the study periods. Whole-genome phylogenetic analysis demonstrated that most strains were lineage 1 (195/208, 93.8 %), with the remainder being lineages 2 (8/208, 3.8 %) or 4 (5/208, 2.4 %). Lineages or sub-lineages were not associated with patient ethnicity. The lineage 1 strains were erse, with sub-lineage 1.2.1 being dominant (192, 98 %). Lineage 1.2.1.3 isolates were geographically most widely distributed. The greatest ersity occurred in a border town sub-district. The time to the most recent common ancestor for the three major lineage 1.2.1 clades was estimated to be the year 1966 (95 % HPD 1948–1976). An association was found between failure of culture conversion by week 8 of treatment and infection with lineage 2 (4/6, 67 %) compared with lineage 1 strains (4/83, 5 %) ( P .001), supporting evidence of greater virulence of lineage 2 strains. Eleven potential transmission clusters (SNP difference ≤12) were identified at least five included people living in different sub-districts. Some linked cases spanned the whole 4-year study period. One cluster involved a multidrug-resistant tuberculosis strain matching a drug-susceptible strain from 3 years earlier. Drug resistance mutations were uncommon, but revealed one phenotype–genotype mismatch in a genotypically multidrug-resistant isolate, and rare nonsense mutations within the katG gene in two isolates. Consistent with the regionally mobile population, M. tuberculosis strains in Kota Kinabalu were erse, although several lineage 1 strains dominated and were locally well established. Transmission clusters – uncommonly identified, likely attributable to incomplete s ling – showed clustering occurring across the community, not confined to households or sub-districts. The findings indicate that public health priorities should include active case finding and early institution of tuberculosis management in mobile populations, while there is a need to upscale effective contact investigation beyond households to include other contacts within social networks.

Publication

Metabolic profiling of polycystic ovary syndrome reveals interactions with abdominal obesity

Publisher: Springer Science and Business Media LLC

Date: 26-05-2017

DOI: 10.1038/IJO.2017.126

Publication

A Deep Reinforcement Learning Approach for Composing Moving IoT Services

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2022

DOI: 10.1109/TSC.2021.3064329

Publication

A service computing manifesto

Publisher: Association for Computing Machinery (ACM)

Date: 24-03-2017

DOI: 10.1145/2983528

Abstract: Mapping out the challenges and strategies for the widespread adoption of service computing.

Publication

Integrated pathogen load and dual transcriptome analysis of systemic host-pathogen interactions in severe malaria

Publisher: American Association for the Advancement of Science (AAAS)

Date: 27-06-2018

DOI: 10.1126/SCITRANSLMED.AAR3619

Abstract: Host and parasite RNA sequencing is combined with parasite load estimates to reveal mechanisms associated with human severe malaria.

Publication

Fine-Scale Estimation of Location of Birth from Genome-Wide Single-Nucleotide Polymorphism Data

Publisher: Oxford University Press (OUP)

Date: 02-2012

DOI: 10.1534/GENETICS.111.135657

Abstract: Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer in iduals’ birthplace from genetic data. Such inference has been demonstrated for in iduals’ country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses in iduals’ PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 in iduals to a median distance of 47 km. We also apply pcLOCATE to Indian in iduals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.

Publication

Multi-Perspective Trust Management Framework for Crowdsourced IoT Services

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2022

DOI: 10.1109/TSC.2021.3052219

Publication

Activity-based Profiling for Energy Harvesting Estimation

Publisher: ACM

Date: 09-05-2023

DOI: 10.1145/3583120.3589833

Publication

Detection of Tuberculosis in HIV-Infected and -Uninfected African Adults Using Whole Blood RNA Expression Signatures: A Case-Control Study

Publisher: Public Library of Science (PLoS)

Date: 22-10-2013

DOI: 10.1371/JOURNAL.PMED.1001538

Publication

Non-antibiotic antimicrobial triclosan induces multiple antibiotic resistance through genetic mutation

Publisher: Cold Spring Harbor Laboratory

Date: 18-02-2018

DOI: 10.1101/267302

Abstract: Antibiotic resistance poses a major threat to public health. Overuse and misuse of antibiotics are generally recognised as the key factors contributing to antibiotic resistance. However, whether non-antibiotic, anti-microbial (NAAM) chemicals can directly induce antibiotic resistance is unclear. We aim to investigate whether the exposure to a NAAM chemical triclosan (TCS) has an impact on inducing antibiotic resistance on Escherichia coli . Here, we report that at a concentration of 0.2 mg/L TCS induces multi-drug resistance in wild-type Escherichia coli after 30-day TCS exposure. The oxidative stress induced by TCS caused genetic mutations in genes such as fabI , frdD , marR , acrR and soxR , and subsequent up-regulation of the transcription of genes encoding beta-lactamase and multi-drug efflux pump, together with down-regulation of genes related to membrane permeability. The findings advance our understanding of the potential role of NAAM chemicals in the dissemination of antibiotic resistance in microbes, and highlights the need for controlling biocide applications.

Publication

Context-aware cloud service selection based on comparison and aggregation of user subjective assessment and objective performance assessment

Publisher: IEEE

Date: 06-2014

DOI: 10.1109/ICWS.2014.24

Publication

Variants in ADCY5 and near CCNL1 are associated with fetal growth and birth weight

Publisher: Springer Science and Business Media LLC

Date: 06-04-2010

DOI: 10.1038/NG.567

Publication

Signatures of TSPAN8 variants associated with human metabolic regulation and diseases

Publisher: Elsevier BV

Date: 08-2021

DOI: 10.1016/J.ISCI.2021.102893

Publication

{sCNAphase}: using haplotype resolved read depth to genotype somatic copy number alterations from low cellularity aneuploid tumors

Publisher: Cold Spring Harbor Laboratory

Date: 04-02-2016

DOI: 10.1101/038828

Abstract: Accurate identification of copy number alterations is an essential step in understanding the events driving tumor progression. While a variety of algorithms have been developed to use high-throughput sequencing data to profile copy number changes, no tool is able to reliably characterize ploidy and genotype absolute copy number from tumor s les which contain less than 40% tumor cells. To increase our power to resolve the copy number profile from low-cellularity tumor s les, we developed a novel approach which pre-phases heterozygote germline SNPs in order to replace the commonly used ‘B-allele frequency’ with a more powerful ‘parental-haplotype frequency’. We apply our tool - sCNAphase - to characterize the copy number and loss-of-heterozygosity profiles of four publicly available breast cancer cell-lines. Comparisons to previous spectral karyotyping and microarray studies revealed that sCNAphase reliably identified overall ploidy as well as the in idual copy number mutations from each cell-line. Analysis of artificial cell-line mixtures demonstrated the capacity of this method to determine the level of tumor cellularity, consistently identify sCNAs and characterize ploidy in s les with as little as 10% tumor cells. This novel methodology has the potential to bring sCNA profiling to low-cellularity tumors, a form of cancer unable to be accurately studied by current methods.

Publication

Spatio-temporal composition of crowdsourced services

Publisher: Springer Berlin Heidelberg

Date: 2015

DOI: 10.1007/978-3-662-48616-0_26

Publication

Dysregulation of Complement System and CD4+ T Cell Activation Pathways Implicated in Allergic Response

Publisher: Public Library of Science (PLoS)

Date: 08-10-2013

DOI: 10.1371/JOURNAL.PONE.0074821

Publication

High-throughput multiplexed tandem repeat genotyping using targeted long-read sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 17-06-2019

DOI: 10.1101/673251

Abstract: Tandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between in iduals. However, population variation of TRs have not been widely explored due to the limitations of existing tools, which are either low-throughput or restricted to a small subset of TRs. Here, we used SureSelect targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X sequence coverage on 7 s les utilizing 2 MinION flow-cells with 200ng of input DNA per s le. We identified a subset of 110 TR loci with length less than 2kb, and GC content greater than 25% for which we achieved an average genotyping rate of 75% and increasing to 91% for the highest-coverage s le. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and moreover highly correlated with alleles estimated from whole genome long-read sequencing. We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more s les facilitating large-scale analysis of TRs.

Publication

Optimizing Long-term IaaS Service Composition

Publisher: Springer Berlin Heidelberg

Date: 2015

DOI: 10.1007/978-3-662-48616-0_22

Publication

SCNAphase: Using haplotype resolved read depth to genotype somatic copy number alterations from low cellularity aneuploid tumors

Publisher: Oxford University Press (OUP)

Date: 28-11-2017

DOI: 10.1093/NAR/GKW1086

Publication

Genome-Wide Association Scan Meta-Analysis Identifies Three Loci Influencing Adiposity and Fat Distribution

Publisher: Public Library of Science (PLoS)

Date: 26-06-2009

DOI: 10.1371/JOURNAL.PGEN.1000508

Publication

Service Trust Management for E-Government Applications

Publisher: Springer New York

Date: 06-08-2013

DOI: 10.1007/978-1-4614-7535-4_14

Publication

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-24434-6

Publication

Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions

Publisher: Springer Science and Business Media LLC

Date: 12-2008

DOI: 10.1186/1471-2105-9-513

Abstract: The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21 Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each in idual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a s ling algorithm to infer haplotypes jointly in multiple in iduals and to obtain a measure of uncertainty in its inferences. In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.

Publication

Realtime analysis and visualization of MinION sequencing data with npReader

Publisher: Oxford University Press (OUP)

Date: 10-11-2015

DOI: 10.1093/BIOINFORMATICS/BTV658

Abstract: Motivation: The recently released Oxford Nanopore MinION sequencing platform presents many innovative features opening up potential for a range of applications not previously possible. Among these features, the ability to sequence in real-time provides a unique opportunity for many time-critical applications. While many software packages have been developed to analyze its data, there is still a lack of toolkits that support the streaming and real-time analysis of MinION sequencing data. Results: We developed npReader, an open-source software package to facilitate real-time analysis of MinION sequencing data. npReader can simultaneously extract sequence reads and stream them to downstream analysis pipelines while the s les are being sequenced on the MinION device. It provides a command line interface for easy integration into a bioinformatics work flow, as well as a graphical user interface which concurrently displays the statistics of the run. It also provides an application programming interface for development of streaming algorithms in order to fully utilize the extent of nanopore sequencing potential. Availability and implementation: npReader is written in Java and is freely available at dcao/npReader. Contact: m.cao1@uq.edu.au or l.coin@imb.uq.edu.au

Publication

Mycobacterium tuberculosis Exploits a Molecular Off Switch of the Immune System for Intracellular Survival

Publisher: Springer Science and Business Media LLC

Date: 12-01-2018

DOI: 10.1038/S41598-017-18528-Y

Abstract: Mycobacterium tuberculosis ( M. tuberculosis ) survives and multiplies inside human macrophages by subversion of immune mechanisms. Although these immune evasion strategies are well characterised functionally, the underlying molecular mechanisms are poorly understood. Here we show that during infection of human whole blood with M. tuberculosis , host gene transcriptional suppression, rather than activation, is the predominant response. Spatial, temporal and functional characterisation of repressed genes revealed their involvement in pathogen sensing and phagocytosis, degradation within the phagolysosome and antigen processing and presentation. To identify mechanisms underlying suppression of multiple immune genes we undertook epigenetic analyses. We identified significantly differentially expressed microRNAs with known targets in suppressed genes. In addition, after searching regions upstream of the start of transcription of suppressed genes for common sequence motifs, we discovered novel enriched composite sequence patterns, which corresponded to Alu repeat elements, transposable elements known to have wide ranging influences on gene expression. Our findings suggest that to survive within infected cells, mycobacteria exploit a complex immune “molecular off switch” controlled by both microRNAs and Alu regulatory elements.

Publication

npInv: accurate detection and genotyping of inversions mediated by non-allelic homologous recombination using long read sub-alignment

Publisher: Cold Spring Harbor Laboratory

Date: 18-08-2017

DOI: 10.1101/178103

Abstract: Detection of genomic inversions remains challenging. Many existing methods primarily target inversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored. We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm presence of two of these novel NAHR inversions. We show that there is a near linear relationship between the length of flanking IR and the size of the NAHR inversion.

Publication

Privacy Protection for Wireless Medical Sensor Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2016

DOI: 10.1109/TDSC.2015.2406699

Publication

GWAS on longitudinal growth traits reveals different genetic factors influencing infant, child, and adult BMI

Publisher: American Association for the Advancement of Science (AAAS)

Date: 06-09-2019

DOI: 10.1126/SCIADV.AAW3095

Abstract: Longitudinal data find a new variant controlling BMI in infancy and reveal genetic differences between infant and adult BMI.

Publication

Transcriptional and epi-transcriptional dynamics of SARS-CoV-2 during cellular infection

Publisher: Elsevier BV

Date: 05-2021

DOI: 10.1016/J.CELREP.2021.109108

Lachlan Coin

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

A Trust Prediction Model for Service Web

Evaluating the Genome and Resistome of Extensively Drug-Resistant Klebsiella pneumoniae using Native DNA and RNA Nanopore Sequencing

Multifactorial chromosomal variants regulate polymyxin resistance in extensively drug-resistant Klebsiella pneumoniae

An integrated map of genetic variation from 1,092 human genomes

Rare Genomic Structural Variants in Complex Disease: Lessons from the Replication of Associations with Obesity

A Trust Ontology for Semantic Services

Web Service management system for bioinformatics research: a case study

Streaming algorithms for identification of pathogens and antibiotic resistance potential from real-time MinION™ sequencing

Inferring combined CNV/SNP haplotypes from genotype data

Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

Real-time demultiplexing Nanopore barcoded sequencing data with npBarcode

Identification of Reduced Host Transcriptomic Signatures for Tuberculosis Disease and Digital PCR-Based Validation and Quantification

Highly interconnected genes in disease-specific networks are enriched for disease-associated polymorphisms

Improving the humification and phosphorus flow during swine manure composting: A trial for enhancing the beneficial applications of hazardous biowastes

Genotype-free demultiplexing of pooled single-cell RNA-seq

Transcriptional and epi-transcriptional dynamics of SARS-CoV-2 during cellular infection

Molecular Methods for Pathogenic Bacteria Detection and Recent Advances in Wastewater Analysis

invertFREGENE: software for simulating inversions in population genetic data

Detection of Streptococcus pyogenes M1UK in Australia and characterization of the mutation driving enhanced expression of superantigen SpeA

Qualitative economic model for long-term IaaS composition

Meta-path based service recommendation in heterogeneous information networks

Enabling Privacy Preserving Mobile Advertising via Private Information Retrieval

Comparison of long-read methods for sequencing and assembly of a plant genome

Digerati – A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins

Profiling copy number alterations in cell-free tumour DNA using a single-reference

Childhood tuberculosis is associated with decreased abundance of T cell gene transcripts and impaired T cell function

Identification of regulatory variants associated with genetic susceptibility to meningococcal disease

Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning

Octapeptin C4 Induces Less Resistance and Novel Mutations in an Epidemic Carbapenemase-producing Klebsiella pneumoniae ST258 Clinical Isolate Compared to Polymyxins

Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus

Nanoq: ultra-fast quality control for nanopore reads

Chiron: translating nanopore raw signal directly into nucleotide sequence using deep learning

MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS

Privacy-Preserving User Profile Matching in Social Networks

famCNV: copy number variant association for quantitative traits in families

Service mining for internet of things

Computational analysis and prediction of PE_PGRS proteins using machine learning

Signatures Of Tspan8 Variants Associated With Human Metabolic Regulation And Diseases

Porpoise: a new approach for accurate prediction of RNA pseudouridine sites

Rapid diagnosis of Capnocytophaga canimorsus septic shock in an immunocompetent individual using real-time Nanopore sequencing: a case report

Trust Management in Cloud Services

Multi-clonal evolution of multi-drug-resistant/extensively drug-resistant Mycobacterium tuberculosis in a high-prevalence setting of Papua New Guinea for over three decades.

A census of human cancer genes

Genetic variability in the regulation of gene expression in ten regions of the human brain

Investigation of the HIN200 Locus in UK SLE Families Identifies Novel Copy Number Variants

Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes

Insights into population structure of East African sweetpotato cultivars from hybrid assembly of chloroplast genomes

Multi-Use Trust in Crowdsourced IoT Services

Genome sequences of two diploid wild relatives of cultivated sweetpotato reveal targets for genetic improvement

Direct RNA sequencing and early evolution of SARS-CoV-2

Personalized API recommendation via implicit preference modeling

DeepGenGrep: a general deep learning-based predictor for multiple genomic signals and regions

Optimising Treatment Outcomes for Children and Adults Through Rapid Genome Sequencing of Sepsis Pathogens. A Study Protocol for a Prospective, Multi-Centre Trial (DIRECT)

Temporal pattern based QoS prediction

YHap: a population model for probabilistic assignment of Y haplogroups from re-sequencing data

Diagnosis of Kawasaki Disease Using a Minimal Whole-Blood Gene Expression Signature

Drone-as-a-Service Composition Under Uncertainty

A complete high quality nanopore-only assembly of an XDR Mycobacterium tuberculosis Beijing lineage strain identifies novel variation in repetitive PE/PPE gene regions

On building a hyperdistributed database

Genome-wide association analysis of metabolic traits in a birth cohort from a founder population

Novel association approach for variable number tandem repeats (VNTRs) identifies DOCK5 as a susceptibility gene for severe obesity

SARS-CoV-2 mouse adaptation selects virulence mutations that cause TNF-driven age-dependent severe disease with human correlates

Building enterprise mashups

Complete Genome Sequences of Clinical Pandoraea fibrosis Isolates

Identification of reduced host transcriptomic signatures for tuberculosis and digital PCR-based validation and quantification

CCCloud: Context-aware and credible cloud service selection based on subjective assessment and objective assessment

Whole-exome Sequencing for the Identification of Rare Variants in Primary Immunodeficiency Genes in Children With Sepsis: A Prospective, Population-based Cohort Study

Genome-wide association and genetic functional studies identify autism susceptibility candidate 2 gene ( AUTS2 ) in the regulation of alco

Simulating the Dynamics of Targeted Capture Sequencing with CapSim

Accurate Single-Nucleotide Polymorphism Allele Assignment in Trisomic or Duplicated Regions by Using a Single Base–Extension Assay with MALDI-TOF Mass Spectrometry

Enhanced protein domain discovery by using language modeling techniques from speech recognition.

Understanding Detrimental Host Response to Infection-The Promise of Transcriptomics∗