ORCID Profile
0000-0001-6339-2644
Current Organisations
Australian National University
,
École Polytechnique Fédérale de Lausanne
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Gene Expression (incl. Microarray and other genome-wide approaches) | Genomics | Bioinformatics | Genetics
Expanding Knowledge in the Biological Sciences | Computer Software and Services not elsewhere classified | Expanding Knowledge in the Medical and Health Sciences |
Publisher: Springer Science and Business Media LLC
Date: 29-08-2012
Abstract: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be res led yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2022
Publisher: Springer International Publishing
Date: 2022
Publisher: Springer Nature Switzerland
Date: 2023
DOI: 10.1007/978-3-031-29119-7_1
Abstract: With the high mutation rate in viruses, a mixture of closely related viral strains (called viral quasispecies) often co-infect an in idual host. Reconstructing in idual strains from viral quasispecies is a key step to characterizing the viral population, revealing strain-level genetic variability, and providing insights into biomedical and clinical studies. Reference-based approaches of reconstructing viral strains suffer from the lack of high-quality references due to high mutation rates and biased variant calling introduced by a selected reference. De novo methods require no references but face challenges due to errors in reads, the high similarity of quasispecies, and uneven abundance of strains. In this paper, we propose VStrains, a de novo approach for reconstructing strains from viral quasispecies. VStrains incorporates contigs, paired-end reads, and coverage information to iteratively extract the strain-specific paths from assembly graphs. We benchmark VStrains against multiple state-of-the-art de novo and reference-based approaches on both simulated and real datasets. Experimental results demonstrate that VStrains achieves the best overall performance on both simulated and real datasets under a comprehensive set of metrics such as genome fraction, duplication ratio, NGA50, error rate, etc . Availability: VStrains is freely available at github.com/ MetaGenTools/VStrains .
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2019
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Science and Business Media LLC
Date: 2010
Publisher: Mary Ann Liebert Inc
Date: 10-2009
Abstract: As data about genomic architecture accumulates, genomic rearrangements have attracted increasing attention. One of the main rearrangement mechanisms, inversions (also called reversals), was characterized by Hannenhalli and Pevzner and this characterization in turn extended by various authors. The characterization relies on the concepts of breakpoints, cycles, and obstructions colorfully named hurdles and fortresses. In this paper, we study the probability of generating a hurdle in the process of sorting a permutation if one does not take special precautions to avoid them (as in a randomized algorithm, for instance). To do this we revisit and extend the work of Caprara and of Bergeron by providing simple and exact characterizations of the probability of encountering a hurdle in a random permutation. Using similar methods we provide the first asymptotically tight analysis of the probability that a fortress exists in a random permutation. Finally, we study other aspects of hurdles, both analytically and through experiments: when are they created in a sequence of sorting inversions, how much later are they detected, and how much work may need to be undone to return to a sorting sequence.
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Cold Spring Harbor Laboratory
Date: 05-09-2020
DOI: 10.1101/2020.09.04.260208
Abstract: Coral reefs are the epitome of species ersity, yet the number of described scleractinian coral species, the framework-builders of coral reefs, remains moderate by comparison. DNA sequencing studies are rapidly challenging this notion by exposing a wealth of undescribed ersity, but the evolutionary and ecological significance of this ersity remains largely unclear. Here, we present an annotated genome for one of the most ubiquitous corals in the Indo-Pacific ( Pachyseris speciosa ), and uncover through a comprehensive genomic and phenotypic assessment that it comprises morphologically indistinguishable, but ecologically ergent cryptic lineages. Demographic modelling based on whole-genome resequencing disproved that morphological crypsis was due to recent ergence, and instead indicated ancient morphological stasis. Although the lineages occur sympatrically across shallow and mesophotic habitats, extensive genotyping using a rapid diagnostic assay revealed differentiation of their ecological distributions. Leveraging “common garden” conditions facilitated by the overlapping distributions, we assessed physiological and quantitative skeletal traits and demonstrated concurrent phenotypic differentiation. Lastly, spawning observations of genotyped colonies highlighted the potential role of temporal reproductive isolation in the limited admixture, with consistent genomic signatures in genes related to morphogenesis and reproduction. Overall, our findings demonstrate how ecologically and phenotypically ergent coral species can evolve despite morphological stasis, and provide new leads into the potential mechanisms facilitating such ergence in sympatry. More broadly, they indicate that our current taxonomic framework for reef-building corals may be scratching the surface of the ecologically relevant ersity on coral reefs, consequently limiting our ability to protect or restore this ersity effectively.
Publisher: Mary Ann Liebert Inc
Date: 05-2015
Abstract: Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.
Publisher: Springer Science and Business Media LLC
Date: 09-11-2017
DOI: 10.1038/S41598-017-15484-5
Abstract: Phylogenetic studies aim to discover evolutionary relationships and histories. These studies are based on similarities of morphological characters and molecular sequences. Currently, widely accepted phylogenetic approaches are based on multiple sequence alignments, which analyze shared gene datasets and concatenate/coalesce these results to a final phylogeny with maximum support. However, these approaches still have limitations, and often have conflicting results with each other. Reconstructing ancestral genomes helps us understand mechanisms and corresponding consequences of evolution. Most existing genome level phylogeny and ancestor reconstruction methods can only process simplified real genome datasets or simulated datasets with identical genome content, unique genome markers, and limited types of evolutionary events. Here, we provide an alternative way to resolve phylogenetic problems based on analyses of real genome data. We use phylogenetic signals from all types of genome level evolutionary events, and overcome the conflicting issues existing in traditional phylogenetic approaches. Further, we build an automated computational pipeline to reconstruct phylogenies and ancestral genomes for two high-resolution real yeast genome datasets. Comparison results with recent studies and publications show that we reconstruct very accurate and robust phylogenies and ancestors. Finally, we identify and analyze the conserved syntenic blocks among reconstructed ancestral genomes and present yeast species.
Publisher: Mary Ann Liebert Inc
Date: 03-2018
Abstract: Genome rearrangement is known as one of the main evolutionary mechanisms on the genomic level. Phylogenetic analysis based on rearrangement played a crucial role in biological research in the past decades, especially with the increasing availability of fully sequenced genomes. In general, phylogenetic analysis aims to solve two problems: small parsimony problem (SPP) and big parsimony problem (BPP). Maximum parsimony is a popular approach for SPP and BPP, which relies on iteratively solving an NP-hard problem, the median problem. As a result, current median solvers and phylogenetic inference methods based on the median problem all face serious problems on scalability and cannot be applied to data sets with large and distant genomes. In this article, we propose a new median solver for gene order data that combines double-cut-and-join sorting with the simulated annealing algorithm. Based on this median solver, we built a new phylogenetic inference method to solve both SPP and BPP problems. Our experimental results show that the new median solver achieves an excellent performance on simulated data sets, and the phylogenetic inference tool built based on the new median solver has a better performance than other existing methods.
Publisher: Springer London
Date: 2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Springer Science and Business Media LLC
Date: 10-2017
Publisher: Springer Science and Business Media LLC
Date: 08-11-2014
Publisher: Mary Ann Liebert Inc
Date: 09-2011
Abstract: The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.
Publisher: American Chemical Society (ACS)
Date: 10-11-2017
DOI: 10.1021/ACS.ANALCHEM.7B03820
Abstract: Bacterial typing is of great importance in clinical diagnosis, environmental monitoring, food safety analysis, and biological research. Matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) is now widely used to analyze bacterial s les. Identification of bacteria at the species level can be realized by matching the mass spectra of s les against a library of mass spectra of known bacteria. Nevertheless, in order to reasonably type bacteria, identification accuracy should be further improved. Herein, we propose a new framework to the identification and assessment for MALDI-MS based bacterial analysis. Our approach combines new measures for spectra similarity and a novel bootstrapping assessment. We tested our approach on a general data set containing the mass spectra of 1741 strains of bacteria and another challenging data set containing 250 strains, including 40 strains in the Bacillus cereus group that were previously claimed to be impossible to resolve by MALDI-MS. With the bootstrapping assessment, we achieved much more reliable predictions at both the genus and species level, and enabled to resolve the Bacillus cereus group. To the best of the authors' knowledge, our method is the first to provide a statistical assessment to MALDI-MS based bacterial typing that could lead to more reliable bacterial typing.
Publisher: Springer International Publishing
Date: 2015
Publisher: No publisher found
Date: 2019
Publisher: No publisher found
Date: 2019
Publisher: Springer International Publishing
Date: 2014
Publisher: Springer Science and Business Media LLC
Date: 12-2012
Publisher: Oxford University Press (OUP)
Date: 11-10-2012
DOI: 10.1093/BIOINFORMATICS/BTS603
Abstract: Summary: TIBA is a tool to reconstruct phylogenetic trees from rearrangement data that consist of ordered lists of synteny blocks (or genes), where each synteny block is shared with all of its homologues in the input genomes. The evolution of these synteny blocks, through rearrangement operations, is modelled by the uniform Double-Cut-and-Join model. Using a true distance estimate under this model and simple distance-based methods, TIBA reconstructs a phylogeny of the input genomes. Unlike any previous tool for inferring phylogenies from rearrangement data, TIBA uses novel methods of robustness estimation to provide support values for the edges in the inferred tree. Availability: lcbb.epfl.ch/softwares/tiba.html. Contact: vaibhav.rajan@epfl.ch
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Science and Business Media LLC
Date: 22-10-2018
Publisher: Springer Science and Business Media LLC
Date: 08-08-2014
Publisher: Cold Spring Harbor Laboratory
Date: 07-05-2018
Abstract: Although segmental duplications (SDs) represent hotbeds for genomic rearrangements and emergence of new genes, there are still no easy-to-use tools for identifying SDs. Moreover, while most previous studies focused on recently emerged SDs, detection of ancient SDs remains an open problem. We developed an SDquest algorithm for SD finding and applied it to analyzing SDs in human, gorilla, and mouse genomes. Our results demonstrate that previous studies missed many SDs in these genomes and show that SDs account for at least 6.05% of the human genome (version hg19), a 17% increase as compared to the previous estimate. Moreover, SDquest classified 6.42% of the latest GRCh38 version of the human genome as SDs, a large increase as compared to previous studies. We thus propose to re-evaluate evolution of SDs based on their accurate representation across multiple genomes. Toward this goal, we analyzed the complex mosaic structure of SDs and decomposed mosaic SDs into elementary SDs, a prerequisite for follow-up evolutionary analysis. We also introduced the concept of the breakpoint graph of mosaic SDs that revealed SD hotspots and suggested that some SDs may have originated from circular extrachromosomal DNA (ecDNA), not unlike ecDNA that contributes to accelerated evolution in cancer.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2012
Publisher: Springer Berlin Heidelberg
Date: 2017
Publisher: MDPI AG
Date: 17-05-2010
DOI: 10.3390/RS2051378
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Springer Science and Business Media LLC
Date: 10-2014
Publisher: Springer Science and Business Media LLC
Date: 04-2019
DOI: 10.1038/S41587-019-0072-8
Abstract: Accurate genome assembly is h ered by repetitive regions. Although long single molecule sequencing reads are better able to resolve genomic repeats than short-read data, most long-read assembly algorithms do not provide the repeat characterization necessary for producing optimal assemblies. Here, we present Flye, a long-read assembly algorithm that generates arbitrary paths in an unknown repeat graph, called disjointigs, and constructs an accurate repeat graph from these error-riddled disjointigs. We benchmark Flye against five state-of-the-art assemblers and show that it generates better or comparable assemblies, while being an order of magnitude faster. Flye nearly doubled the contiguity of the human genome assembly (as measured by the NGA50 assembly quality metric) compared with existing assemblers.
Publisher: Proceedings of the National Academy of Sciences
Date: 12-12-2016
Abstract: When the long reads generated using single-molecule se-quencing (SMS) technology were made available, most researchers were skeptical about the ability of existing algorithms to generate high-quality assemblies from long error-prone reads. Nevertheless, recent algorithmic breakthroughs resulted in many successful SMS sequencing projects. However, as the recent assemblies of important plant pathogens illustrate, the problem of assembling long error-prone reads is far from being resolved even in the case of relatively short bacterial genomes. We propose an algorithmic approach for assembling long error-prone reads and describe the ABruijn assembler, which results in accurate genome reconstructions.
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Springer Science and Business Media LLC
Date: 09-01-2020
DOI: 10.1038/S41467-019-13866-Z
Abstract: Data-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of s les. However, s le-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Herein, we propose DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis. We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide rotein detection, and enhance DIA analysis on human serum s les compared to the state-of-the-art protocol using a DDA library. We expect this work expanding the toolbox for DIA proteomics.
Publisher: Springer Science and Business Media LLC
Date: 04-2019
Publisher: Springer Science and Business Media LLC
Date: 30-07-2011
Publisher: Springer Science and Business Media LLC
Date: 2010
Publisher: Elsevier BV
Date: 07-2018
Publisher: World Scientific Pub Co Pte Lt
Date: 04-2007
DOI: 10.1142/S0219720007002643
Abstract: In protein identification by tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. To date, the widely-used database searching methods adopted simple statistical models for predicting. For some peptide, these models usually yield a theoretical spectrum with a significant deviation from the experimental one. In this paper, in order to derive an improved predicting model, we utilized a non-linear programming model to quantify the factors impacting peptide fragmentation. Then, an iterative algorithm was proposed to solve this optimization problem. Upon a training set of 1803 spectra, the experimental result showed a good agreement with some known principles about peptide fragmentation, such as the tendency to cleave at the middle of peptide, and Pro's preference of the N-terminal cleavage. Moreover, upon a testing set of 941 spectra, comparison of the predicted spectra against the experimental ones showed that this method can generate reasonable predictions. The results in this paper can offer help to both database searching and de novo methods.
Publisher: Springer Berlin Heidelberg
Date: 2015
Publisher: American Chemical Society (ACS)
Date: 20-12-2007
DOI: 10.1021/PR070479V
Abstract: In protein identification through tandem mass spectrometry, it is critical to accurately predict the theoretical spectrum for a peptide sequence. The widely used prediction models, such as SEQUEST and MASCOT, ignore the intensity of the ions with important neutral losses, including water loss and ammonia loss. However, ignoring these neutral losses results in a significant deviation between the predicted theoretical spectrum and its experimental counterpart. Here, based on the "one peak, multiple explanations" observation, we proposed an expectation-maximization (EM) method to automatically learn the probabilities of water loss and ammonia loss for each amino acid. Then we employed these probabilities to design an improved statistical model for theoretical spectrum prediction. We implemented these methods and tested them on practical data. On a training set containing 1803 spectra, the experimental results show a good agreement with some known knowledge about neutral losses, such as the tendency of water loss from Asp, Glu, Ser, and Thr. Furthermore, on a testing set containing 941 spectra, the improved similarity between the experimental and predicted spectra demonstrates that this method can generate more reasonable predictions relative to the model that ignores neutral losses. As an application of the derived probabilities, we implemented a database searching method adopting the improved theoretical spectrum model with neutral loss ions estimated. Experimental results on Keller's data set demonstrate that this method can identify peptides more accurately than SEQUEST. In another application to validate SEQUEST's results, the reported peptide-spectrum pairs are reranked with respect to the similarity between experimental and predicted spectra. Experimental results on both LTQ and QSTAR data sets suggest that this reranking strategy can effectively distinguish the false negative predictions reported by SEQUEST.
Publisher: Springer International Publishing
Date: 2022
Publisher: Cold Spring Harbor Laboratory
Date: 21-10-2022
DOI: 10.1101/2022.10.21.513181
Abstract: With the high mutation rate in viruses, a mixture of closely related viral strains (called viral quasispecies) often co-infect an in idual host. Reconstructing in idual strains from viral quasispecies is a key step to characterizing the viral population, revealing strain-level genetic variability, and providing insights into biomedical and clinical studies. Reference-based approaches of reconstructing viral strains suffer from the lack of high-quality references due to high mutation rates and biased variant calling introduced by a selected reference. De novo methods require no references but face challenges due to errors in reads, the high similarity of quasispecies, and uneven abundance of strains. In this paper, we propose VStrains, a de novo approach for reconstructing strains from viral quasispecies. VStrains incorporates contigs, paired-end reads, and coverage information to iteratively extract the strain-specific paths from assembly graphs. We benchmark VStrains against multiple state-of-the-art de novo and reference-based approaches on both simulated and real datasets. Experimental results demonstrate that VStrains achieves the best overall performance on both simulated and real datasets under a comprehensive set of metrics such as genome fraction, duplication ratio, NGA50, error rate, etc . VStrains is freely available at github.com/MetaGenTools/VStrains .
Publisher: Springer Science and Business Media LLC
Date: 11-02-2010
Publisher: Springer Berlin Heidelberg
Date: 2006
DOI: 10.1007/11851561_29
Publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Date: 2020
Publisher: Springer Science and Business Media LLC
Date: 2016
Publisher: Oxford University Press (OUP)
Date: 07-2020
DOI: 10.1093/BIOINFORMATICS/BTAA441
Abstract: Metagenomics studies have provided key insights into the composition and structure of microbial communities found in different environments. Among the techniques used to analyse metagenomic data, binning is considered a crucial step to characterize the different species of micro-organisms present. The use of short-read data in most binning tools poses several limitations, such as insufficient species-specific signal, and the emergence of long-read sequencing technologies offers us opportunities to surmount them. However, most current metagenomic binning tools have been developed for short reads. The few tools that can process long reads either do not scale with increasing input size or require a database with reference genomes that are often unknown. In this article, we present MetaBCC-LR, a scalable reference-free binning method which clusters long reads directly based on their k-mer coverage histograms and oligonucleotide composition. We evaluate MetaBCC-LR on multiple simulated and real metagenomic long-read datasets with varying coverages and error rates. Our experiments demonstrate that MetaBCC-LR substantially outperforms state-of-the-art reference-free binning tools, achieving ∼13% improvement in F1-score and ∼30% improvement in ARI compared to the best previous tools. Moreover, we show that using MetaBCC-LR before long-read assembly helps to enhance the assembly quality while significantly reducing the assembly cost in terms of time and memory usage. The efficiency and accuracy of MetaBCC-LR pave the way for more effective long-read-based metagenomics analyses to support a wide range of applications. The source code is freely available at: nuradhawick/MetaBCC-LR. Supplementary data are available at Bioinformatics online.
Publisher: Mary Ann Liebert Inc
Date: 09-2011
Abstract: Genomic rearrangements have been studied since the beginnings of modern genetics and models for such rearrangements have been the subject of many papers over the last 10 years. However, none of the extant models can predict the evolution of genomic organization into circular unichromosomal genomes (as in most prokaryotes) and linear multichromosomal genomes (as in most eukaryotes). Very few of these models support gene duplications and losses--yet these events may be more common in evolutionary history than rearrangements and themselves cause apparent rearrangements. We propose a new evolutionary model that integrates gene duplications and losses with genome rearrangements and that leads to genomes with either one (or a very few) circular chromosome or a collection of linear chromosomes. Our model is based on existing rearrangement models and inherits their linear-time algorithms for pairwise distance computation (for rearrangement only). Moreover, our model predictions fit observations about the evolution of gene family sizes and agree with the existing predictions about the growth in the number of chromosomes in eukaryotic genomes.
Publisher: Oxford University Press (OUP)
Date: 07-2008
DOI: 10.1093/BIOINFORMATICS/BTN148
Abstract: Motivation: Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes? Results: We present a method to estimate the true evolutionary distance between two genomes under the ‘double-cut-and-join’ (DCJ) model of genome rearrangement, a model under which a single multichromosomal operation accounts for all genomic rearrangement events: inversion, transposition, translocation, block interchange and chromosomal fusion and fission. Our method relies on a simple structural characterization of a genome pair and is both analytically and computationally tractable. We provide analytical results to describe the asymptotic behavior of genomes under the DCJ model, as well as experimental results on a wide variety of genome structures to exemplify the very high accuracy (and low variance) of our estimator. Our results provide a tool for accurate phylogenetic reconstruction from multichromosomal gene rearrangement data as well as a theoretical basis for refinements of the DCJ model to account for biological constraints. Availability: All of our software is available in source form under GPL at lcbb.epfl.ch Contact: bernard.moret@epfl.ch
Publisher: Elsevier BV
Date: 11-2019
Publisher: Springer Science and Business Media LLC
Date: 22-09-2016
Publisher: Society for Industrial & Applied Mathematics (SIAM)
Date: 2008
DOI: 10.1137/060664112
Publisher: Springer International Publishing
Date: 2020
Publisher: Cold Spring Harbor Laboratory
Date: 17-05-2020
DOI: 10.1101/2020.05.17.100305
Abstract: The development of DNA sequencing technologies provides the opportunity to call heterozygous SNPs for each in idual. SNP calling is a fundamental problem of genetic analysis and has many applications, such as gene-disease diagnosis, drug design, and ancestry inference. Reference-based SNP calling approaches generate highly accurate results, but they face serious limitations especially when high-quality reference genomes are not available for many species. Although reference-free approaches have the potential to call SNPs without using the reference genome, they have not been widely applied on large and complex genomes because existing approaches suffer from low recall recision or high runtime. We develop a reference-free algorithm Kmer2SNP to call SNP directly from raw reads. Kmer2SNP first computes the k-mer frequency distribution from reads and identifies potential heterozygous k-mers which only appear in one haplotype. Kmer2SNP then constructs a graph by choosing these heterozygous k-mers as vertices and connecting edges between pairs of heterozygous k-mers that might correspond to SNPs. Kmer2SNP further assigns a weight to each edge using overlapping information between heterozygous k-mers, computes a maximum weight matching and finally outputs SNPs as edges between k-mer pairs in the matching. We benchmark Kmer2SNP against reference-free methods including hybrid (assembly-based) and assembly-free methods on both simulated and real datasets. Experimental results show that Kmer2SNP achieves better SNP calling quality while being an order of magnitude faster than the state-of-the-art methods. Kmer2SNP shows the potential of calling SNPs only using k-mers from raw reads without assembly. The source code is freely available at anboANU/Kmer2SNP .
Publisher: Elsevier BV
Date: 06-2021
DOI: 10.1016/J.CUB.2021.03.028
Abstract: Coral reefs are the epitome of species ersity, yet the number of described scleractinian coral species, the framework-builders of coral reefs, remains moderate by comparison. DNA sequencing studies are rapidly challenging this notion by exposing a wealth of undescribed ersity, but the evolutionary and ecological significance of this ersity remains largely unclear. Here, we present an annotated genome for one of the most ubiquitous corals in the Indo-Pacific (Pachyseris speciosa) and uncover, through a comprehensive genomic and phenotypic assessment, that it comprises morphologically indistinguishable but ecologically ergent lineages. Demographic modeling based on whole-genome resequencing indicated that morphological crypsis (across micro- and macromorphological traits) was due to ancient morphological stasis rather than recent ergence. Although the lineages occur sympatrically across shallow and mesophotic habitats, extensive genotyping using a rapid molecular assay revealed differentiation of their ecological distributions. Leveraging "common garden" conditions facilitated by the overlapping distributions, we assessed physiological and quantitative skeletal traits and demonstrated concurrent phenotypic differentiation. Lastly, spawning observations of genotyped colonies highlighted the potential role of temporal reproductive isolation in the limited admixture, with consistent genomic signatures in genes related to morphogenesis and reproduction. Overall, our findings demonstrate the presence of ecologically and phenotypically ergent coral species without substantial morphological differentiation and provide new leads into the potential mechanisms facilitating such ergence. More broadly, they indicate that our current taxonomic framework for reef-building corals may be scratching the surface of the ecologically relevant ersity on coral reefs, consequently limiting our ability to protect or restore this ersity effectively.
Publisher: Mary Ann Liebert Inc
Date: 03-2010
Abstract: The study of genomic inversions (or reversals) has been a mainstay of computational genomics for nearly 20 years. After the initial breakthrough of Hannenhalli and Pevzner, who gave the first polynomial-time algorithm for sorting signed permutations by inversions, improved algorithms have been designed, culminating with an optimal linear-time algorithm for computing the inversion distance and a subquadratic algorithm for providing a shortest sequence of inversions--also known as sorting by inversions. Remaining open was the question of whether sorting by inversions could be done in O(nlogn) time. In this article, we present a qualified answer to this question, by providing two new sorting algorithms, a simple and fast randomized algorithm and a deterministic refinement. The deterministic algorithm runs in time O(nlogn + kn), where k is a data-dependent parameter. We provide the results of extensive experiments showing that both the average and the standard deviation for k are small constants, independent of the size of the permutation. We conclude (but do not prove) that almost all signed permutations can be sorted by inversions in O(nlogn) time.
Publisher: Springer Science and Business Media LLC
Date: 10-2013
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO.
Date: 07-2006
Publisher: Springer Science and Business Media LLC
Date: 12-09-2016
DOI: 10.1038/NG.3662
Abstract: G-quadruplex (G4) structural motifs have been linked to transcription, replication and genome instability and are implicated in cancer and other diseases. However, it is crucial to demonstrate the bona fide formation of G4 structures within an endogenous chromatin context. Herein we address this through the development of G4 ChIP-seq, an antibody-based G4 chromatin immunoprecipitation and high-throughput sequencing approach. We find ∼10,000 G4 structures in human chromatin, predominantly in regulatory, nucleosome-depleted regions. G4 structures are enriched in the promoters and 5' UTRs of highly transcribed genes, particularly in genes related to cancer and in somatic copy number lifications, such as MYC. Strikingly, de novo and enhanced G4 formation are associated with increased transcriptional activity, as shown by HDAC inhibitor-induced chromatin relaxation and observed in immortalized as compared to normal cellular states. Our findings show that regulatory, nucleosome-depleted chromatin and elevated transcription shape the endogenous human G4 DNA landscape.
Publisher: American Chemical Society (ACS)
Date: 09-08-2018
DOI: 10.1021/ACS.ANALCHEM.8B02258
Abstract: Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) is now widely used to characterize bacterial s les for clinical diagnosis, food safety control, environmental monitoring, and so on. However, existing standard approaches are only applied to analyze single colonies purified by plate culture, which limits the approaches to cultivable bacteria and makes the whole approaches time-consuming. In this work, we propose a new framework to analyze MALDI-TOF spectra of bacterial mixtures and to directly characterize each component without purification procedures. The framework is a combination of a synthetic mixture model based on a non-negative linear combination of candidate reference spectra and a statistical assessment by in silico generated spectra via a jackknife res ling. Ninety-seven model bacterial mixture s les and 8 cocultured blind-coded bacterial mixture s les, containing up to 6 strains in varied ratios in each s le, together with a reference database containing the mass spectra of 1081 strains, were used to validate the framework. High sensitivity (>80%, with error rate 60% for balanced quaternary and pentabasic mixtures, and 48%-71% for asymmetric situation, with error rate <5%. The work can facilitate rapid and reliable characterization of bacterial mixtures without purification procedures, which is of practical value in clinical diagnosis, food safety control, environmental monitoring, and so on. The framework can be further applied to many other spectroscopy-based analytics to interpret spectra from mixed s les.
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Berlin Heidelberg
Date: 2014
Start Date: 2013
End Date: 2014
Funder: Swiss National Science Foundation
View Funded ActivityStart Date: 2014
End Date: 2016
Funder: Swiss National Science Foundation
View Funded ActivityStart Date: 2015
End Date: 2018
Funder: University Grants Committee
View Funded ActivityStart Date: 2022
End Date: 12-2024
Amount: $637,955.00
Funder: Australian Research Council
View Funded Activity