ORCID Profile
0000-0002-4628-7938
Current Organisation
University of Tasmania Foundation
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Biological Mathematics | Molecular Evolution | Applied Mathematics | Evolutionary Impacts of Climate Change | Statistical Theory | Phylogeny and Comparative Analysis | Plant Physiology | Stochastic Analysis and Modelling | Statistics | Other Environmental Sciences | Intellectual Property Law | Environmental Sciences not elsewhere classified | Evolutionary Biology | Plant Biology | Speciation and Extinction | Population, Ecological and Evolutionary Genetics | Biological Adaptation
Expanding Knowledge in the Biological Sciences | Expanding Knowledge in the Mathematical Sciences | Effects of Climate Change and Variability on Antarctic and Sub-Antarctic Environments (excl. Social Impacts) | Expanding Knowledge in the Agricultural and Veterinary Sciences | Expanding Knowledge in Law and Legal Studies | Flora, Fauna and Biodiversity at Regional or Larger Scales |
Publisher: Springer Science and Business Media LLC
Date: 22-04-2017
DOI: 10.1007/S00285-017-1129-2
Abstract: Recently there has been renewed interest in phylogenetic inference methods based on phylogenetic invariants, alongside the related Markov invariants. Broadly speaking, both these approaches give rise to polynomial functions of sequence site patterns that, in expectation value, either vanish for particular evolutionary trees (in the case of phylogenetic invariants) or have well understood transformation properties (in the case of Markov invariants). While both approaches have been valued for their intrinsic mathematical interest, it is not clear how they relate to each other, and to what extent they can be used as practical tools for inference of phylogenetic trees. In this paper, by focusing on the special case of binary sequence data and quartets of taxa, we are able to view these two different polynomial-based approaches within a common framework. To motivate the discussion, we present three desirable statistical properties that we argue any invariant-based phylogenetic method should satisfy: (1) sensible behaviour under reordering of input sequences (2) stability as the taxa evolve independently according to a Markov process and (3) explicit dependence on the assumption of a continuous-time process. Motivated by these statistical properties, we develop and explore several new phylogenetic inference methods. In particular, we develop a statistically bias-corrected version of the Markov invariants approach which satisfies all three properties. We also extend previous work by showing that the phylogenetic invariants can be implemented in such a way as to satisfy property (3). A simulation study shows that, in comparison to other methods, our new proposed approach based on bias-corrected Markov invariants is extremely powerful for phylogenetic inference. The binary case is of particular theoretical interest as-in this case only-the Markov invariants can be expressed as linear combinations of the phylogenetic invariants. A wider implication of this is that, for models with more than two states-for ex le DNA sequence alignments with four-state models-we find that methods which rely on phylogenetic invariants are incapable of satisfying all three of the stated statistical properties. This is because in these cases the relevant Markov invariants belong to a class of polynomials independent from the phylogenetic invariants.
Publisher: Oxford University Press (OUP)
Date: 19-03-2004
Publisher: Elsevier BV
Date: 07-2008
DOI: 10.1016/J.YMPEV.2008.02.023
Abstract: A previous study of the relationships amongst three subgroups of the Austral Asplenium ferns found conflicting signal between the two chloroplast loci investigated. Because organelle genomes like those of chloroplasts and mitochondria are thought to be non-recombining, with a single evolutionary history, we sequenced four additional chloroplast loci with the expectation that this would resolve these relationships. Instead, the conflict was only magnified. Although tree-building analyses favoured one of the three possible trees, one of the alternative trees actually had one more supporting site (six versus five) and received greater support in spectral and neighbor-net analyses. Simulations suggested that chance alone was unlikely to produce strong support for two of the possible trees and none for the third. Likelihood permutation tests indicated that the concatenated chloroplast sequence data appeared to have experienced recombination. However, recombination between the chloroplast genomes of different species would be highly atypical, and corollary supporting observations, like chloroplast heteroplasmy, are lacking. Wider taxon s ling clarified the composition of the Austral group, but the conflicting signal meant analyses (e.g., morphological evolution, biogeographic) conditional on a well-supported phylogeny could not be performed.
Publisher: Oxford University Press (OUP)
Date: 02-2007
DOI: 10.1080/10635150601167013
Abstract: Inferring species phylogenies is an important part of understanding molecular evolution. Even so, it is well known that an accurate phylogenetic tree reconstruction for a single gene does not always necessarily correspond to the species phylogeny. One commonly accepted strategy to cope with this problem is to sequence many genes the way in which to analyze the resulting collection of genes is somewhat more contentious. Supermatrix and supertree methods can be used, although these can suppress conflicts arising from true differences in the gene trees caused by processes such as lineage sorting, horizontal gene transfer, or gene duplication and loss. In 2004, Huson et al. (IEEE/ACM Trans. Comput. Biol. Bioinformatics 1:151-158) presented the Z-closure method that can circumvent this problem by generating a supernetwork as opposed to a supertree. Here we present an alternative way for generating supernetworks called Q-imputation. In particular, we describe a method that uses quartet information to add missing taxa into gene trees. The resulting trees are subsequently used to generate consensus networks, networks that generalize strict and majority-rule consensus trees. Through simulations and application to real data sets, we compare Q-imputation to the matrix representation with parsimony (MRP) supertree method and Z-closure, and demonstrate that it provides a useful complementary tool.
Publisher: Oxford University Press (OUP)
Date: 03-2010
Publisher: Cold Spring Harbor Laboratory
Date: 09-2003
DOI: 10.1101/GR.1024903
Abstract: The ALS (agglutinin-like sequence) gene family encodes proteins that play a role in adherence of the yeast Candida albicans to endothelial and epithelial cells. The proteins are proposed as virulence factors for this important fungal pathogen of humans. We analyzed 66 C. albicans strains, representing a worldwide collection of 266 infection-causing isolates, and discovered 60 alleles of the ALS7 open reading frame (ORF). Differences between alleles were largely caused by rearrangements of repeat elements in the so-called tandem repeat domain (21 different types occurred) and the VASES region (19 different types). C. albicans is diploid, and combinations of ALS7 alleles generated 49 different genotypes. ALS7 expression was detected in s les isolated directly from five oral candidosis patients. ORFs in the opposite direction contained within the ALS 7 ORF were also transcribed in all strains tested. Isolates representing a more pathogenic general-purpose genotype (GPG) cluster of strains tended to have more tandem repeats than other strains. Two types of VASES regions were largely exclusive to GPG strains the remaining types were largely exclusive to noncluster strains. Our results provide evidence that ALS7 is a hypermutable contingency locus and important for the success of C. albicans as an opportunistic pathogen of humans.
Publisher: Springer Science and Business Media LLC
Date: 30-01-2023
DOI: 10.1007/S11538-023-01120-Z
Abstract: The algebraic properties of flattenings and subflattenings provide direct methods for identifying edges in the true phylogeny—and by extension the complete tree—using pattern counts from a sequence alignment. The relatively small number of possible internal edges among a set of taxa (compared to the number of binary trees) makes these methods attractive however, more could be done to evaluate their effectiveness for inferring phylogenetic trees. This is the case particularly for subflattenings, and the work we present here makes progress in this area. We introduce software for constructing and evaluating subflattenings for splits, utilising a number of methods to make computing subflattenings more tractable. We then present the results of simulations we have performed in order to compare the effectiveness of subflattenings to that of flattenings in terms of split score distributions, and susceptibility to possible biases. We find that subflattenings perform similarly to flattenings in terms of the distribution of split scores on the trees we examined, but may be less affected by bias arising from both split size/balance and long branch attraction. These insights are useful for developing effective algorithms to utilise these tools for the purpose of inferring phylogenetic trees.
Publisher: American Chemical Society (ACS)
Date: 13-04-2016
Abstract: Wastewater-based epidemiology is increasingly being used as a tool to monitor drug use trends. To minimize costs, studies have typically monitored a small number of days. However, cycles of drug use may display weekly and seasonal trends that affect the accuracy of monthly or annual drug use estimates based on a limited number of s les. This study aimed to rationalize s ling methods for minimizing the number of s les required while maximizing information about temporal trends. A range of s ling strategies were examined: (i) targeted days (e.g., weekends), (ii) completely random or stratified random s ling, and (iii) a number of s ling strategies informed by known weekly cycles in drug use data. Using a time-series approach, analysis was performed for four drugs (MDMA, meth hetamine, cocaine, methadone) collected through a continuous s ling program over 14 months. Results showed, for drugs with weekly cycles (MDMA, meth hetamine and cocaine in this s le), s ling strategies which made use of those weekly cycles required fewer s les to obtain similar information as s ling 5 days per week and had better accuracy than stratified random s ling techniques.
Publisher: Cold Spring Harbor Laboratory
Date: 27-08-2018
DOI: 10.1101/400648
Abstract: Molecular phylogenetics plays a key role in comparative genomics and has an increasingly-significant impacts on science, industry, government, public health, and society. In this opinion paper, we posit that the current phylogenetic protocol is missing two critical steps, and that their absence allows model misspecification and confirmation bias to unduly influence our phylogenetic estimates. Based on the potential offered by well-established but under-used procedures, such as assessment of phylogenetic assumptions and tests of goodness-of-fit, we introduce a new phylogenetic protocol that will reduce confirmation bias and increase the accuracy of phylogenetic estimates. To the memory of Rossiter H. Crozier (1943-2009), an evolutionary biologist, who, with his great generosity and wide-reaching inquisitiveness, inspired students and scientists in Australia, and abroad.
Publisher: Wiley
Date: 25-02-2020
Publisher: Elsevier BV
Date: 2012
DOI: 10.1016/J.YMPEV.2011.09.028
Abstract: Neotropical reef fish communities are species-poor compared to those of the Indo-West Pacific. An exception to that pattern is the blenny clade Chaenopsidae, one of only three rocky and coral reef fish families largely endemic to the Neotropics. Within the chaenopsids, the genus Acanthemblemaria is the most species-rich and is characterized by elaborate spinous processes on the skull. Here we construct a species tree using five nuclear markers and compare the results to those from Bayesian and parsimony phylogenetic analyses of 60 morphological characters. The sequence-based species tree conflicted with the morphological phylogenies for Acanthemblemaria, primarily due to the convergence of a suite of characters describing the distribution of spines on the head. However, we were able to resolve some of these conflicts by performing phylogenetic analyses on suites of characters not associated with head spines. By using the species tree as a guide, we used a quantitative method to identify suites of correlated morphological characters that, together, produce the distinctive skull phenotypes found in these fishes. A time calibrated phylogeny with nearly complete taxon s ling provided ergence time estimates that recovered a mid-Miocene origin for the genus, with a temporally and geographically complex pattern of speciation both before and after the closure of the Isthmus of Panama. Some sister taxa are broadly sympatric, but many occur in allopatry. The ability to infer the geography of speciation in Acanthemblemaria is complicated by extinctions, incomplete knowledge of their present geographic ranges and by wide-spread taxa that likely represent cryptic species complexes.
Publisher: Elsevier BV
Date: 02-2005
Publisher: Wiley
Date: 09-10-2023
DOI: 10.1111/BRV.12905
Abstract: Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many in idual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and ergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Publisher: Oxford University Press (OUP)
Date: 31-07-2019
Abstract: Molecular sequence data that have evolved under the influence of heterotachous evolutionary processes are known to mislead phylogenetic inference. We introduce the General Heterogeneous evolution On a Single Topology (GHOST) model of sequence evolution, implemented under a maximum-likelihood framework in the phylogenetic program IQ-TREE (www.iqtree.org). Simulations show that using the GHOST model, IQ-TREE can accurately recover the tree topology, branch lengths, and substitution model parameters from heterotachously evolved sequences. We investigate the performance of the GHOST model on empirical data by s ling phylogenomic alignments of varying lengths from a plastome alignment. We then carry out inference under the GHOST model on a phylogenomic data set composed of 248 genes from 16 taxa, where we find the GHOST model concurs with the currently accepted view, placing turtles as a sister lineage of archosaurs, in contrast to results obtained using traditional variable rates-across-sites models. Finally, we apply the model to a data set composed of a sodium channel gene of 11 fish taxa, finding that the GHOST model is able to elucidate a subtle component of the historical signal, linked to the previously established convergent evolution of the electric organ in two geographically distinct lineages of electric fish. We compare inference under the GHOST model to partitioning by codon position and show that, owing to the minimization of model constraints, the GHOST model offers unique biological insights when applied to empirical data.
Publisher: Oxford University Press (OUP)
Date: 19-07-2006
Publisher: Oxford University Press (OUP)
Date: 02-2017
DOI: 10.1093/GBE/EVW290
Publisher: Informa UK Limited
Date: 26-06-2014
DOI: 10.1080/00480169.2014.914861
Abstract: To quantify the numbers of live cattle, sheep and poultry imported into New Zealand and, where possible, their country of origin from 1860 to 1979. Information on the origin and number of live animal importations into New Zealand was collected for cattle, sheep and poultry for the period 1868-1979 from the annual reports compiled by the New Zealand Registrar General's Office, Government Statistician's Office, Census and Statistics Office, Census and Statistics Department, Customs Department and Department of Statistics. Census data from 1851 to 1871 were also used to estimate the livestock population during this period. The number of animals imported and the mean population for each species in a decade were determined, and the major countries of origin were identified. A large number of cattle (53,384) and sheep (604,525) were imported in the 1860s, and then there was a marked reduction in importations. Live poultry were imported in relatively small numbers (20,701) from 1880 to 1939, then 1,564,330 live poultry were imported between 1960 and 1979. Australia was the predominant country of origin for sheep between 1868 and 1959 (51,347/60,918 84.3%) and of cattle between 1868 and 1979 (10,080/15,157 66.5%). Only 6,712 (11.0%) sheep and 3,909 (25.8%) cattle were imported from the United Kingdom over the same periods, and even fewer from other countries. The collated data and historical reports show that from 1860 to 1979 Australia has been the main source of livestock introduced into New Zealand. The pattern of importation showed that large numbers of cattle and sheep were initially imported in the 1860s, probably in response to rapid agricultural expansion. Thereafter importations continued at much reduced numbers. In contrast, relatively small numbers of poultry were introduced until the 1960s when large numbers were imported as part of the development of a modern high-production industry. The overall pattern for both cattle and sheep was of a bottleneck event, as initially a relatively limited number of animals arrived from outside populations, followed by population expansion with ongoing but limited immigration (admixture). Investigation into the genetic population structure of New Zealand's cattle and sheep, as well as their host-associated microorganisms, could reflect the impact of these early historical events.
Publisher: Springer Science and Business Media LLC
Date: 31-01-2017
Publisher: Elsevier BV
Date: 11-2004
Publisher: Springer Science and Business Media LLC
Date: 12-2014
Publisher: Elsevier BV
Date: 11-2022
DOI: 10.1016/J.YMPEV.2022.107566
Abstract: We consider a subfunctionalisation model of gene family evolution. A family of n genes that perform z functions is represented by an n×z binary matrix Y
Publisher: Oxford University Press (OUP)
Date: 04-2003
DOI: 10.1080/10635150390192771
Abstract: We conducted a simulation study of the phylogenetic methods UPGMA, neighbor joining, maximum parsimony, and maximum likelihood for a five-taxon tree under a molecular clock. The parameter space included a small region where maximum parsimony is inconsistent, so we tested inconsistency correction for parsimony and distance correction for neighbor joining. As expected, corrected parsimony was consistent. For these data, maximum likelihood with the clock assumption outperformed each of the other methods tested. The distance-based methods performed marginally better than did maximum parsimony and maximum likelihood without the clock assumption. Data correction was generally detrimental to accuracy, especially for short sequence lengths. We identified another region of the parameter space where, although consistent for a given method, some incorrect trees were each selected with up to twice the frequency of the correct (generating) tree for sequences of bounded length. These incorrect trees are those where the outgroup has been incorrectly placed. In addition to this problem, the placement of the outgroup sequence can have a confounding effect on the ingroup tree, whereby the ingroup is correct when using the ingroup sequences alone, but with the inclusion of the outgroup the ingroup tree becomes incorrect.
Publisher: Wiley
Date: 22-07-2013
DOI: 10.1002/MBO3.102
Publisher: Oxford University Press (OUP)
Date: 08-2005
Abstract: Long-branch attraction is a well-known source of systematic error that can mislead phylogenetic methods it is frequently invoked post hoc, upon recovering a different tree from the one expected based on prior evidence. We demonstrate that methods that do not force the data onto a single tree, such as spectral analysis, Neighbor-Net, and consensus networks, can be used to detect conflicting signals within the data, including those caused by long-branch attraction. We illustrate this approach using a set of taxa from three unambiguously monophyletic families within the Pelecaniformes: the darters, the cormorants and shags, and the gannets and boobies. These three families are universally acknowledged as forming a monophyletic group, but the relationship between the families remains contentious. Using sequence data from three mitochondrial genes (12S, ATPase 6, and ATPase 8) we demonstrate that the relationship between these three families is difficult to resolve because they are separated by a short internal branch and there are conflicting signals due to long-branch attraction, which are confounded with nonhomogeneous sequence evolution across the different genes. Spectral analysis, Neighbor-Net, and consensus networks reveal conflicting signals regarding the placement of one of the darters, with support found for darter monophyly, but also support for a conflicting grouping with the outgroup, pelicans. Furthermore, parsimony and maximum-likelihood analyses produced different trees, with one of the two most parsimonious trees not supporting the monophyly of the darters. Monte Carlo simulations, however, were not sensitive enough to reveal long-branch attraction unless the branches are longer than those actually observed. These results indicate that spectral analysis, Neighbor-Net, and consensus networks offer a powerful approach to detecting and understanding the source of conflicting signals within phylogenetic data.
Publisher: Public Library of Science (PLoS)
Date: 13-11-2019
Publisher: Wiley
Date: 04-2018
DOI: 10.1111/JBI.13214
Publisher: Elsevier BV
Date: 06-2017
DOI: 10.1016/J.JTBI.2017.04.015
Abstract: Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. Distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. Corresponding corrections for genome rearrangement distances fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here, we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using a group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. The second aspect tackles the problem of accounting for the symmetries of circular arrangements. While, generally, a frame of reference is locked, and all computation made accordingly, this work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference. The philosophy of accounting for symmetries can be applied to any existing correction method, for which ex les are offered.
Publisher: Informa UK Limited
Date: 31-10-2019
Publisher: Elsevier BV
Date: 09-2003
DOI: 10.1016/S1055-7903(03)00053-8
Abstract: This contribution addresses two questions: which alignment patterns are causing non-monophyly of the Asellota and what is the phylogenetic history of this group? The Asellota are small benthic crustaceans occurring in most aquatic habitats. In view of the complex morphological apomorphies known for this group, monophyly of the Asellota has never been questioned. Using ssu rDNA sequences of outgroups and of 16 asellote species from fresh water, littoral marine habitats and from deep-sea localities, the early ergence between the lineages in fresh water and in the ocean, and the monophyly of the deep-sea taxon Munnopsidae are confirmed. Relative substitution rates of freshwater species are much lower than in other isopod species, rates being highest in some littoral marine genera (Carpias and Jaera). Furthermore, more sequence sites are variable in marine than in freshwater species, the latter conserve outgroup character states. Monophyly is recovered with parsimony methods, but not with distance and maximum likelihood analyses, which tear apart the marine from the freshwater species. The information content of alignments was studied with spectra of supporting positions. The scarcity of signal (=apomorphic nucleotides) supporting monophyly of the Asellota is attributed to a short stem-line of this group or to erosion of signal in fast evolving marine species. Parametric boostrapping in combination with spectra indicates that a tree model cannot explain the data and that monophyly of the Asellota should not be rejected even though many topologies do not recover this taxon.
Publisher: Springer Science and Business Media LLC
Date: 19-07-2006
Publisher: Public Library of Science (PLoS)
Date: 15-03-2010
Publisher: Oxford University Press (OUP)
Date: 10-2000
DOI: 10.1093/OXFORDJOURNALS.MOLBEV.A026252
Abstract: Maximum likelihood (ML) is a widely used criterion for selecting optimal evolutionary trees. However, the nature of the likelihood surface for trees is still not sufficiently understood, especially with regard to the frequency of multiple optima. Here, we initiate an analytic study for identifying sequences that generate multiple optima. We concentrate on the problem of optimizing edge weights for a given tree or trees (as opposed to searching through the space of all trees). We report a new approach to computing ML directly, which we have used to find large families of sequences that have multiple optima, including sequences with a continuum of optimal points. Such data sets are best supported by different (two or more) phylogenies that vary significantly in their timings of evolutionary events. Some standard biological processes can lead to data with multiple optima, and consequently the field needs further investigation. Our results imply that hill-climbing techniques as currently implemented in various software packages cannot guarantee that one will find the global ML point, even if it is unique.
Publisher: Springer Science and Business Media LLC
Date: 06-2012
Publisher: Oxford University Press (OUP)
Date: 27-09-2003
DOI: 10.1093/BIOINFORMATICS/BTG1062
Abstract: We introduce a mechanism for analytically deriving upper bounds on the maximum likelihood for genetic sequence data on sets of phylogenies. A simple ‘partition’ bound is introduced for general models. Tighter bounds are developed for the simplest model of evolution, the two state symmetric model of nucleotide substitution under the molecular clock. This follows earlier theoretical work which has been restricted to this model by analytic complexity. A weakness of current numerical computation is that reported ‘maximum likelihood’ results cannot be guaranteed, both for a specified tree (because of the possibility of multiple maxima) or over the full tree space (as the computation is intractable for large sets of trees). The bounds we develop here can be used to conclusively eliminate large proportions of tree space in the search for the maximum likelihood tree. This is vital in the development of a branch and bound search strategy for identifying the maximum likelihood tree. We report the results from a simulation study of approximately 106 data sets generated on clock-like trees of five leaves. In each trial a likelihood value of one specific instance of a parameterised tree is compared to the bound determined for each of the 105 possible rooted binary trees. The proportion of trees that are eliminated from the search for the maximum likelihood tree ranged from 92% to almost 98%, indicating a computational speed–up factor of between 12 and 44. Contact: m.hendy@massey.ac.nz *To whom correspondence should be addressed.
Publisher: Oxford University Press (OUP)
Date: 17-04-2014
DOI: 10.1093/MNRAS/STU464
Publisher: Oxford University Press (OUP)
Date: 20-12-2012
Abstract: The relationships of the 3 major clades of winged insects-Ephemeroptera, Odonata, and Neoptera-are still unclear. Many morphologists favor a clade Metapterygota (Odonata +Neoptera), but Chiastomyaria (Ephemeroptera + Neoptera) or Palaeoptera (Ephemeroptera +Odonata) has also been supported in some older and more recent studies. A possible explanation for the difficulties in resolving these relationships is concerted convergence-the convergent evolution of entire character complexes under the same or similar selective pressures. In this study, we analyze possible instances of this phenomenon in the context of head structures of Ephemeroptera, Odonata, and Neoptera. We apply a recently introduced formal approach to detect the occurrence of concerted convergence. We found that characters of the tentorium and mandibles in particular, but also some other head structures, have apparently not evolved independently, and thus can cause artifacts in tree reconstruction. Our subsequent analyses, which exclude character sets that may be affected by concerted convergence, corroborate the Palaeoptera concept. We show that the analysis of homoplasy and its influence on tree inference can be formally improved with important consequences for the identification of incompatibilities between data sets. Our results suggest that modified weighting (or exclusion of characters) in cases of formally identified correlated cliques of characters may improve morphology-based tree reconstruction.
Publisher: The Royal Society
Date: 07-06-2003
Publisher: Oxford University Press (OUP)
Date: 18-05-2018
Publisher: Elsevier BV
Date: 09-2022
DOI: 10.1016/J.JSB.2022.107870
Abstract: Discovery of new folds in the Protein Data Bank (PDB) has all but ceased. This could be viewed as evidence that all existing protein folds have been documented. S ling bias has, however, been presented as an alternative explanation. Furthermore, although we may know of all protein folds that do exist, we may not have documented all protein folds that could exist. While addressing completeness in the context of entire protein structures is extremely difficult, they can be simplified in a number of ways. One such simplification is presented: considering protein structures as a series of α helices and β sheets and analysing the geometric relationships between these successive secondary structure elements (SSEs) through torsion angles, lengths and distances. We aimed to find out whether all substructures that could be formed by triplets of these successive SSEs were represented in the PDB. When SSEs were defined with the assignment program Promotif, a gap was identified in the represented torsion angles of helix-strand-strand substructures. This was not present when SSEs were defined with an alternative assignment program with a smaller minimum SSE length, DSSP. We also looked at representing proteins as one-dimensional sequences of SSE types and searched for underrepresented motifs. Completely absent motifs occurred more often than expected at random. If a gap in SSE substructure space exists that could be filled or if a physically possible SSE motif is absent, associated gaps in protein structure space are implied, meaning that the PDB as we know it may not be complete.
Publisher: Oxford University Press (OUP)
Date: 19-01-2015
Publisher: Western Sydney University
Date: 2019
Publisher: Informa UK Limited
Date: 26-03-2020
Publisher: Springer Science and Business Media LLC
Date: 06-05-2020
DOI: 10.1038/S41598-020-64647-4
Abstract: The assumptions underpinning ancestral state reconstruction are violated in many evolutionary systems, especially for traits under directional selection. However, the accuracy of ancestral state reconstruction for non-neutral traits is poorly understood. To investigate the accuracy of ancestral state reconstruction methods, trees and binary characters were simulated under the BiSSE (Binary State Speciation and Extinction) model using a wide range of character-state-dependent rates of speciation, extinction and character-state transition. We used maximum parsimony (MP), BiSSE and two-state Markov (Mk2) models to reconstruct ancestral states. Under each method, error rates increased with node depth, true number of state transitions, and rates of state transition and extinction exceeding 30% for the deepest 10% of nodes and highest rates of extinction and character-state transition. Where rates of character-state transition were asymmetrical, error rates were greater when the rate away from the ancestral state was largest. Preferential extinction of species with the ancestral character state also led to higher error rates. BiSSE outperformed Mk2 in all scenarios where either speciation or extinction was state dependent and outperformed MP under most conditions. MP outperformed Mk2 in most scenarios except when the rates of character-state transition and/or extinction were highly asymmetrical and the ancestral state was unfavoured.
Publisher: Oxford University Press (OUP)
Date: 14-05-2010
Abstract: A phylogenetic tree comprising clades with high bootstrap values or other strong measures of statistical support is usually interpreted as providing a good estimate of the true phylogeny. Convergent evolution acting on groups of characters in concert, however, can lead to highly supported but erroneous phylogenies. Identifying such groups of phylogenetically misleading characters is obviously desirable. Here we present a procedure that uses an independent data source to identify sets of characters that have undergone concerted convergent evolution. We examine the problematic case of the cormorants and shags, for which trees constructed using osteological and molecular characters both have strong statistical support and yet are fundamentally incongruent. We find that the osteological characters can be separated into those that fit the phylogenetic history implied by the molecular data set and those that do not. Moreover, these latter nonfitting osteological characters are internally consistent and form groups of mutually compatible characters or "cliques," which are significantly larger than cliques of shuffled characters. We suggest, therefore, that these cliques of characters are the result of similar selective pressures and are a signature of concerted convergence.
Publisher: Cambridge University Press (CUP)
Date: 2016
DOI: 10.1017/PASA.2016.13
Abstract: We applied three statistical classification techniques—linear discriminant analysis (LDA), logistic regression, and random forests—to three astronomical datasets associated with searches for interstellar masers. We compared the performance of these methods in identifying whether specific mid-infrared or millimetre continuum sources are likely to have associated interstellar masers. We also discuss the interpretability of the results of each classification technique. Non-parametric methods have the potential to make accurate predictions when there are complex relationships between critical parameters. We found that for the small datasets the parametric methods logistic regression and LDA performed best, for the largest dataset the non-parametric method of random forests performed with comparable accuracy to parametric techniques, rather than any significant improvement. This suggests that at least for the specific ex les investigated here accuracy of the predictions obtained is not being limited by the use of parametric models. We also found that for LDA, transformation of the data to match a normal distribution led to a significant improvement in accuracy. The different classification techniques had significant overlap in their predictions further astronomical observations will enable the accuracy of these predictions to be tested.
Publisher: Oxford University Press (OUP)
Date: 13-10-2004
Abstract: We report that for population data, where sequences are very similar to one another, it is often possible to use a two-pronged (MinMax Squeeze) approach to prove that a tree is the shortest possible under the parsimony criterion. Such population data can be in a range where parsimony is a maximum likelihood estimator. This is in sharp contrast to the case with species data, where sequences are much further apart and the problem of guaranteeing an optimal phylogenetic tree is known to be computationally prohibitive for realistic numbers of species, irrespective of whether likelihood or parsimony is the optimality criterion. The Squeeze uses both an upper bound (the length of the shortest tree known) and a lower bound derived from partitions of the columns (the length of the shortest tree possible). If the two bounds meet, the shortest known tree is thus proven to be a shortest possible tree. The implementation is first tested on simulated data sets and then applied to 53 complete human mitochondrial genomes. The shortest possible trees for those data have several significant improvements from the published tree. Namely, a pair of Australian lineages comes deeper in the tree (in agreement with archaeological data), and the non-African part of the tree shows greater agreement with the geographical distribution of lineages.
Publisher: Elsevier BV
Date: 07-2009
DOI: 10.1016/J.MEEGID.2009.01.007
Abstract: Candida albicans is a major opportunistic pathogen of humans. Previous work has demonstrated the existence of a general-purpose genotype (GPG equivalent to clade 1 as defined by multi-locus sequence typing data) that is more frequent than other genotypes as an agent of human disease and commensal colonization. We undertook a genomic screen which indicated that a large number of mutations differentiate GPG strains from other strains and that such mutations are scattered throughout the genome. GPG-specific mutations are non-synonymous more frequently than expected by chance, and are not randomly distributed across functional and structural gene categories. Our analysis has identified three categories of genes in which GPG-specific mutations are over-represented, namely genes for which expression changes during the yeast-hyphal transition, genes for which expression changes as a result of exposure to antifungal agents and repeat-containing ORFs. Although we have no direct evidence that the in idual polymorphisms identified confer selective advantages to GPG strains, the results support our contention that the high prevalence of GPG strains is not merely due to genetic drift but that GPG strains have reached a high prevalence because they possess a multitude of fitness-enhancing traits. They also indicate that the distribution of genes marked by GPG-specific mutations across functional and structural categories could identify physiological traits that are of particular importance to the success of GPG strains in their interactions with the human host.
Publisher: Oxford University Press (OUP)
Date: 06-2005
Abstract: Determining the phylogenetic relationships among the major lines of angiosperms is a long-standing problem, yet the uncertainty as to the phylogenetic affinity of these lines persists. While a number of studies have suggested that the ANITA (Amborella-Nymphaeales-Illiciales-Trimeniales-Aristolochiales) grade is basal within angiosperms, studies of complete chloroplast genome sequences also suggested an alternative tree, wherein the line leading to the grasses branches first among the angiosperms. To improve taxon s ling in the existing chloroplast genome data, we sequenced the chloroplast genome of the monocot Acorus calamus. We generated a concatenated alignment (89,436 positions for 15 taxa), encompassing almost all sequences usable for phylogeny reconstruction within spermatophytes. The data still contain support for both the ANITA-basal and grasses-basal hypotheses. Using simulations we can show that were the ANITA-basal hypothesis true, parsimony (and distance-based methods with many models) would be expected to fail to recover it. The self-evident explanation for this failure appears to be a long-branch attraction (LBA) between the clade of grasses and the out-group. However, this LBA cannot explain the discrepancies observed between tree topology recovered using the maximum likelihood (ML) method and the topologies recovered using the parsimony and distance-based methods when grasses are deleted. Furthermore, the fact that neither maximum parsimony nor distance methods consistently recover the ML tree, when according to the simulations they would be expected to, when the out-group (Pinus) is deleted, suggests that either the generating tree is not correct or the best symmetric model is misspecified (or both). We demonstrate that the tree recovered under ML is extremely sensitive to model specification and that the best symmetric model is misspecified. Hence, we remain agnostic regarding phylogenetic relationships among basal angiosperm lineages.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 22-03-2002
Abstract: Well-preserved subfossil bones of Adélie penguins, Pygoscelis adeliae , underlie existing and abandoned nesting colonies in Antarctica. These bones, dating back to more than 7000 years before the present, harbor some of the best-preserved ancient DNA yet discovered. From 96 radiocarbon-aged bones, we report large numbers of mitochondrial haplotypes, some of which appear to be extinct, given the 380 living birds s led. We demonstrate DNA sequence evolution through time and estimate the rate of evolution of the hypervariable region I using a Markov chain Monte Carlo integration and a least-squares regression analysis. Our calculated rates of evolution are approximately two to seven times higher than previous indirect phylogenetic estimates.
Publisher: The Royal Society
Date: 29-01-2020
Abstract: The size of plant stomata (adjustable pores that determine the uptake of CO 2 and loss of water from leaves) is considered to be evolutionarily important. This study uses fossils from the major Southern Hemisphere family Proteaceae to test whether stomatal cell size responded to Cenozoic climate change. We measured the length and abundance of guard cells (the cells forming stomata), the area of epidermal pavement cells, stomatal index and maximum stomatal conductance from a comprehensive s le of fossil cuticles of Proteaceae, and extracted published estimates of past temperature and atmospheric CO 2 . We developed a novel test based on stochastic modelling of trait evolution to test correlations among traits. Guard cell length increased, and stomatal density decreased significantly with decreasing palaeotemperature. However, contrary to expectations, stomata tended to be smaller and more densely packed at higher atmospheric CO 2 . Thus, associations between stomatal traits and palaeoclimate over the last 70 million years in Proteaceae suggest that stomatal size is significantly affected by environmental factors other than atmospheric CO 2 . Guard cell length, pavement cell area, stomatal density and stomatal index covaried in ways consistent with coordinated development of leaf tissues.
Publisher: Oxford University Press (OUP)
Date: 22-02-2006
Abstract: Although recent studies indicate that estimating phylogenies from alignments of concatenated genes greatly reduces the stochastic error, the potential for systematic error still remains, heightening the need for reliable methods to analyze multigene data sets. Consensus methods provide an alternative, more inclusive, approach for analyzing collections of trees arising from multiple genes. We extend a previously described consensus network method for genome-scale phylogeny (Holland, B. R., K. T. Huber, V. Moulton, and P. J. Lockhart. 2004. Using consensus networks to visualize contradictory evidence for species phylogeny. Mol. Biol. Evol. 21:1459-1461) to incorporate additional information. This additional information could come from bootstrap analysis, Bayesian analysis, or various methods to find confidence sets of trees. The new methods can be extended to include edge weights representing genetic distance. We use three data sets to illustrate the approach: 61 genes from 14 angiosperm taxa and one gymnosperm, 106 genes from eight yeast taxa, and 46 members of a gene family from 15 vertebrate taxa.
Publisher: American Society for Microbiology
Date: 02-2009
DOI: 10.1128/AEM.01979-08
Abstract: In many countries relatively high notification rates of c ylobacteriosis are observed in children under 5 years of age. Few studies have considered the role that environmental exposure plays in the epidemiology of these cases. Wild birds inhabit parks and playgrounds and are recognized carriers of C ylobacter , and young children are at greater risk of ingesting infective material due to their frequent hand-mouth contact. We investigated wild-bird fecal contamination in playgrounds in parks in a New Zealand city. A total of 192 s les of fresh and dried fecal material were cultured to determine the presence of C ylobacter spp. C ylobacter jejuni isolates were also characterized by pulsed-field gel electrophoresis (PFGE) and multilocus sequence typing (MLST), and the profiles obtained were compared with those of human isolates. C. jejuni was isolated from 12.5% of the s les. MLST identified members of clonal complexes ST-45, ST-682, and ST-177 all of these complexes have been recovered from wild birds in Europe. PFGE of ST-45 isolates resulted in profiles indistinguishable from those of isolated obtained from human cases in New Zealand. Members of the ST-177 and ST-682 complexes have been found in starlings ( Sturnus vulgaris ) in the United Kingdom, and these birds were common in playgrounds investigated in New Zealand in this study. We suggest that feces from wild birds in playgrounds could contribute to the occurrence of c ylobacteriosis in preschool children. Further, the C. jejuni isolates obtained in this study belonged to clonal complexes associated with wild-bird populations in the northern hemisphere and could have been introduced into New Zealand in imported wild garden birds in the 19th century.
Publisher: Public Library of Science (PLoS)
Date: 09-03-2010
Publisher: Springer Science and Business Media LLC
Date: 24-07-2020
Publisher: Elsevier BV
Date: 12-2015
Publisher: Oxford University Press (OUP)
Date: 12-2002
DOI: 10.1093/OXFORDJOURNALS.MOLBEV.A004030
Abstract: A method is described that allows the assessment of treelikeness of phylogenetic distance data before tree estimation. This method is related to statistical geometry as introduced by Eigen, Winkler-Oswatitsch, and Dress (1988 [Proc. Natl. Acad. Sci. USA. 85:5913-5917]), and in essence, displays a measure for treelikeness of quartets in terms of a histogram that we call a delta plot. This allows identification of nontreelike data and analysis of noisy data sets arising from processes such as, for ex le, parallel evolution, recombination, or lateral gene transfer. In addition to an overall assessment of treelikeness, in idual taxa can be ranked by reference to the treelikeness of the quartets to which they belong. Removal of taxa on the basis of this ranking results in an increase in accuracy of tree estimation. Recombinant data sets are simulated, and the method is shown to be capable of identifying single recombinant taxa on the basis of distance information alone, provided the parents of the recombinant sequence are sufficiently ergent and the mixture of tree histories is not strongly skewed toward a single tree. delta Plots and taxon rankings are applied to three biological data sets using distances derived from sequence alignment, gene order, and fragment length polymorphism.
Publisher: Oxford University Press (OUP)
Date: 06-2008
DOI: 10.1080/10635150802044037
Abstract: The lified fragment length polymorphism (AFLP) technique is an increasingly popular component of the phylogenetic toolbox, particularly for plant species. Technological advances in capillary electrophoresis now allow very precise estimates of DNA fragment mobility and litude, and current AFLP software allows greater control of data scoring and the production of the binary character matrix. However, for AFLP to become a useful modern tool for large data sets, improvements to automated scoring are required. We design a procedure that can be used to optimize AFLP scoring parameters to improve phylogenetic resolution and demonstrate it for two AFLP scoring programs (GeneMapper and GeneMarker). In general, we found that there was a trade-off between getting more characters of lower quality and fewer characters of high quality. Conservative settings that gave the least error did not give the best phylogenetic resolution, as too many useful characters were discarded. For ex le, in GeneMapper, we found that bin width was a crucial parameter, and that although reducing bin width from 1.0 to 0.5 base pairs increased the error rate, it nevertheless improved resolution due to the increased number of informative characters. For our 30-taxon data sets, moving from default to optimized parameter settings gave between 3 and 11 extra internal edges with >50% bootstrap support, in the best case increasing the number of resolved edges from 14 to 25 out of a possible 27. Nevertheless, improvements to current AFLP software packages are needed to (1) make use of replicate profiles to calibrate the data and perform error calculations and (2) perform tests to optimize scoring parameters in a rigorous and automated way. This is true not only when AFLP data are used for phylogenetics, but also for other applications, including linkage mapping and population genetics.
Publisher: Elsevier BV
Date: 2012
DOI: 10.1016/J.YMPEV.2011.09.011
Abstract: For the predominantly southern hemisphere plant group Styphelioideae (Ericaceae) published sequence datasets of five markers are now available for all except one of the 38 recognised genera. However, several markers are highly incomplete therefore missing data is problematic for producing a genus level phylogeny. We explore the relative utility of supertree and supermatrix approaches for addressing this challenge, and examine the effects of missing data on tree topology and resolution. Although the supertree approach returned a more conservative hypothesis, overall, both supermatrix and supertree analyses concurred in the topologies they returned. Using multiple genes and a dataset of variably complete taxa we found improved support for the monophyly and position of the tribes and genus level relationships. However, there was mixed support for the Richeeae tribe appearing one node basal to the Cosmelieae tribe or vice versa. It is probable that this will only be resolved through further sequencing. Our study supports previous findings that the amount of data is more critical than the completeness of the dataset in estimating well-resolved trees. Our results suggest that a "serendipitous" scaffolding approach that includes a mixture of well and poorly sequenced taxa can lead to robust phylogenetic hypotheses.
Publisher: Springer Science and Business Media LLC
Date: 21-11-2008
Abstract: Commonly used phylogenetic models assume a homogeneous evolutionary process throughout the tree. It is known that these homogeneous models are often too simplistic, and that with time some properties of the evolutionary process can change (due to selection or drift). In particular, as constraints on sequences evolve, the proportion of variable sites can vary between lineages. This affects the ability of phylogenetic methods to correctly estimate phylogenetic trees, especially for long timescales. To date there is no phylogenetic model that allows for change in the proportion of variable sites, and the degree to which this affects phylogenetic reconstruction is unknown. We present LineageSpecificSeqgen, an extension to the seq-gen program that allows generation of sequences with both changes in the proportion of variable sites and changes in the rate at which sites switch between being variable and invariable. In contrast to seq-gen and its derivatives to date, we interpret branch lengths as the mean number of substitutions per variable site, as opposed to the mean number of substitutions per site (which is averaged over all sites, including invariable sites). This allows specification of the substitution rates of variable sites, independently of the proportion of invariable sites. LineageSpecificSeqgen allows simulation of DNA and amino acid sequence alignments under a lineage-specific evolutionary process. The program can be used to test current models of evolution on sequences that have undergone lineage-specific evolution. It facilitates the development of both new methods to identify such processes in real data, and means to account for such processes. The program is available at: awcmee.massey.ac.nz/downloads.htm .
Publisher: Springer Science and Business Media LLC
Date: 28-07-2020
Publisher: Elsevier BV
Date: 10-2010
DOI: 10.1016/J.YMPEV.2010.06.003
Abstract: We examine the effects of isolation over both ancient and contemporary timescales on evolutionary ersification and speciation patterns of springtail species in circum-Antarctica, with special focus on members of the genus Cryptopygus (Collembola, Isotomidae). We employ phylogenetic analysis of mitochondrial DNA (cox1), and ribosomal DNA (18S and 28S) genes in the programmes MrBayes and RAxML. Our aims are twofold: (1) we evaluate existing taxonomy in light of previous work which found dubious taxonomic classification in several taxa based on cox1 analysis (2) we evaluate the biogeographic origin of our chosen suite of springtail species based on dispersal/vicariance scenarios, the magnitude of genetic ergence among lineages and the age and accessibility of potential habitat. The dubious taxonomic characterisation of Cryptopygus species highlighted previously is confirmed by our multi-gene phylogenetic analyses. Specifically, according to the current taxonomy, Cryptopygus antarcticus subspecies are not completely monophyletic and neither are Cryptopygus species in general. We show that distribution patterns among species/lineages are both dispersal- and vicariance-driven. Episodes of colonisation appear to have occurred frequently, the routes of which may have followed currents in the Southern Ocean. In several cases, the estimated ergence dates among species correspond well with the timing of terrestrial habitat availability. We conclude that these isotomid springtails have a varied and erse evolutionary history in the circum-Antarctic that consists of both ancient and recent elements and is reflected in a dynamic contemporary fauna.
Publisher: Oxford University Press (OUP)
Date: 23-08-2007
Abstract: There are many ex les of groups (such as birds, bees, mammals, multicellular animals, and flowering plants) that have undergone a rapid radiation. In such cases, where there is a combination of short internal and long external branches, correctly estimating and rooting phylogenetic trees is known to be a difficult problem. In this simulation study, we tested the performances of different phylogenetic methods at estimating a tree that models a rapid radiation. We found that maximum likelihood, corrected and uncorrected neighbor-joining, and corrected and uncorrected parsimony, all suffer from biases toward specific tree topologies. In addition, we found that using a single-taxon outgroup to root a tree frequently disrupts an otherwise correct ingroup phylogeny. Moreover, for uncorrected parsimony, we found cases where several in idual trees (in which the outgroup was placed incorrectly) were selected more frequently than the correct tree. Even for parameter settings where the correct tree was selected most frequently when using extremely long sequences, for sequences of up to 60,000 nucleotides the incorrectly rooted trees were each selected more frequently than the correct tree. For all the cases tested here, tree estimation using a two taxon outgroup was more accurate than when using a single-taxon outgroup. However, the ingroup was most accurately recovered when no outgroup was used.
Publisher: Wiley
Date: 19-03-2014
DOI: 10.1111/JZO.12133
Publisher: The Royal Society
Date: 11-2014
Abstract: The Tasmanian devil ( Sarcophilus harrisii ) was widespread in Australia during the Late Pleistocene but is now endemic to the island of Tasmania. Low genetic ersity combined with the spread of devil facial tumour disease have raised concerns for the species’ long-term survival. Here, we investigate the origin of low genetic ersity by inferring the species' demographic history using temporal s ling with summary statistics, full-likelihood and approximate Bayesian computation methods. Our results show extensive population declines across Tasmania correlating with environmental changes around the last glacial maximum and following unstable climate related to increased ‘El Niño–Southern Oscillation’ activity.
Publisher: Springer Science and Business Media LLC
Date: 11-12-2017
DOI: 10.1038/S41467-017-02220-W
Abstract: Identifying factors responsible for the emergence and evolution of social complexity is an outstanding challenge in evolutionary biology. Here we report results from a phylogenetic comparative analysis of over 1000 species of squamate reptile, nearly 100 of which exhibit facultative forms of group living, including prolonged parent–offspring associations. We show that the evolution of social groupings among adults and juveniles is overwhelmingly preceded by the evolution of live birth across multiple independent origins of both traits. Furthermore, the results suggest that live bearing has facilitated the emergence of social groups that remain stable across years, similar to forms of sociality observed in other vertebrates. These results suggest that live bearing has been a fundamentally important precursor in the evolutionary origins of group living in the squamates.
Publisher: Oxford University Press (OUP)
Date: 22-03-2012
Publisher: Oxford University Press (OUP)
Date: 21-03-2012
Publisher: Institute of Mathematical Statistics
Date: 03-2015
DOI: 10.1214/14-AOAS795
Publisher: Oxford University Press (OUP)
Date: 27-03-2011
DOI: 10.1093/BIOINFORMATICS/BTR147
Abstract: Motivation: Despite trends towards maximum likelihood and Bayesian criteria, maximum parsimony (MP) remains an important criterion for evaluating phylogenetic trees. Because exact MP search is NP-complete, the computational effort needed to find provably optimal trees skyrockets with increasing numbers of taxa, limiting analyses to around 25–30 taxa. This is, in part, because currently available programs fail to take advantage of parallelism. Results: We present XMP, a new program for finding exact MP trees that comes in both serial and parallel versions. The serial version is faster in nearly all tests than existing software. The parallel version uses a work-stealing algorithm to scale to hundreds of CPUs on a distributed-memory multiprocessor with high efficiency. An optimized SSE2 inner loop provides additional speedup for Pentium 4 and later CPUs. Availability: C source code and several binary versions are freely available from www.massey.ac.nz/~wtwhite/xmp. The parallel version requires an MPI implementation, such as the freely available MPICH2. Contact: w.t.white@massey.ac.nz barbara.holland@utas.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 13-07-2007
Abstract: It is well known that molecular data "saturates" with increasing sequence ergence (thereby losing phylogenetic information) and that in addition the accumulation of misleading information due to chance similarities or to systematic bias may accompany saturation as well. Exploratory data analysis methods that can quantify the extent of signal loss or convergence for a given data set are scarce. Such methods are needed because genomics delivers very long sequence alignments spanning substantial phylogenetic depth, where site saturation may be compounded by systematic biases or other alternative signals. Here we introduce the Treeness Triangle (TT) graph, in which signals detectable by Hadamard (spectral) analysis are summed into 3 categories--those supporting 1) external and 2) internal branches in the optimal tree, in addition to 3) the residuals (potential internal branches not present in the optimal tree). These 3 values are plotted in a standard ternary coordinate system. The approach is illustrated with simulated and real data sets, the latter from complete chloroplast genomes, where potential problems of paralogy or lateral gene acquisition can be excluded. The TT uncovers the ergence-dependent loss of phylogenetic signal as subsets of chloroplast genomes are investigated that span increasingly deeper evolutionary timescales. The rate of signal loss (or signal retention) varies with the gene and/or the method of analysis.
Publisher: Oxford University Press (OUP)
Date: 06-03-2013
DOI: 10.1093/GBE/EVT032
Publisher: Springer Science and Business Media LLC
Date: 14-09-2022
DOI: 10.1007/S11538-022-01072-W
Abstract: Phylogenetic trees describe relationships between extant species, but beyond that their shape and their relative branch lengths can provide information on broader evolutionary processes of speciation and extinction. However, currently many of the most widely used macro-evolutionary models make predictions about the shapes of phylogenetic trees that differ considerably from what is observed in empirical phylogenies. Here, we propose a flexible and biologically plausible macroevolutionary model for phylogenetic trees where times to speciation or extinction events are drawn from a Coxian phase-type (PH) distribution. First, we show that different choices of parameters in our model lead to a range of tree balances as measured by Aldous’ $$\\beta $$ β statistic. In particular, we demonstrate that it is possible to find parameters that correspond well to empirical tree balance. Next, we provide a natural extension of the $$\\beta $$ β statistic to sets of trees. This extension produces less biased estimates of $$\\beta $$ β compared to using the median $$\\beta $$ β values from in idual trees. Furthermore, we derive a likelihood expression for the probability of observing an edge-weighted tree under a model with speciation but no extinction. Finally, we illustrate the application of our model by performing both absolute and relative goodness-of-fit tests for two large empirical phylogenies (squamates and angiosperms) that compare models with Coxian PH distributed times to speciation with models that assume exponential or Weibull distributed waiting times. In our numerical analysis, we found that, in most cases, models assuming a Coxian PH distribution provided the best fit.
Publisher: American Society for Microbiology
Date: 07-2003
DOI: 10.1128/JVI.77.13.7202-7213.2003
Abstract: To clarify the origin and evolution of the primate lentiviruses (PLVs), which include human immunodeficiency virus types 1 and 2 as well as their simian relatives, simian immunodeficiency viruses (SIVs), isolated from several host species, we investigated the phylogenetic relationships among the six supposedly nonrecombinant PLV lineages for which the full genome sequences are available. Employing bootscanning as an exploratory tool, we located several regions in the PLV genome that seem to have uncertain or conflicting phylogenetic histories. Phylogeny reconstruction based on distance and maximum-likelihood algorithms followed by a number of statistical tests confirms the existence of at least five putative recombinant fragments in the PLV genome with different clustering patterns. Split decomposition analysis also shows that phylogenetic relationships among PLVs may be better represented by network-based graphs, such as the ones produced by SplitsTree. Our findings not only imply that the six so-called pure PLV lineages have in fact mosaic genomes but also make more unlikely the hypothesis of cospeciation of SIVs and their simian hosts.
Publisher: Oxford University Press (OUP)
Date: 08-2007
Publisher: Oxford University Press (OUP)
Date: 16-03-2015
Abstract: We assess phylogenetic patterns of hybridization in the speciose, ecologically and economically important genus Eucalyptus, in order to better understand the evolution of reproductive isolation. Eucalyptus globulus pollen was applied to 99 eucalypt species, mainly from the large commercially important subgenus, Symphyomyrtus. In the 64 species that produce seeds, hybrid compatibility was assessed at two stages, hybrid-production (at approximately 1 month) and hybrid-survival (at 9 months), and compared with phylogenies based on 8,350 genome-wide DArT ( ersity arrays technology) markers. Model fitting was used to assess the relationship between compatibility and genetic distance, and whether or not the strength of incompatibility "snowballs" with ergence. There was a decline in compatibility with increasing genetic distance between species. Hybridization was common within two closely related clades (one including E. globulus), but rare between E. globulus and species in two phylogenetically distant clades. Of three alternative models tested (linear, slowdown, and snowball), we found consistent support for a snowball model, indicating that the strength of incompatibility accelerates relative to genetic distance. Although we can only speculate about the genetic basis of this pattern, it is consistent with a Dobzhansky-Muller-model prediction that incompatibilities should snowball with ergence due to negative epistasis. Different rates of compatibility decline in the hybrid-production and hybrid-survival measures suggest that early-acting postmating barriers developed first and are stronger than later-acting barriers. We estimated that complete reproductive isolation can take up to 21-31 My in Eucalyptus. Practical implications for hybrid eucalypt breeding and genetic risk assessment in Australia are discussed.
Publisher: Springer Science and Business Media LLC
Date: 09-09-2015
DOI: 10.1007/S00294-015-0516-8
Abstract: The yeast Candida albicans, a commensal colonizer and occasional pathogen of humans, has a rudimentary mating ability. However, mating is a cumbersome process that has never been observed outside the laboratory, and the population structure of the species is predominantly clonal. Here we discuss recent findings that indicate that mating ability is under selection in C. albicans, i.e. that it is a biologically relevant process. C. albicans strains can only mate after they have sustained genetic damage. We propose that the rescue of such damaged strains by mating may be the primary reason why mating ability is under selection.
Publisher: Oxford University Press (OUP)
Date: 02-2005
DOI: 10.1080/10635150590906055
Abstract: Many phylogenetic methods produce large collections of trees as opposed to a single tree, which allows the exploration of support for various evolutionary hypotheses. However, to be useful, the information contained in large collections of trees should be summarized frequently this is achieved by constructing a consensus tree. Consensus trees display only those signals that are present in a large proportion of the trees. However, by their very nature consensus trees require that any conflicts between the trees are necessarily disregarded. We present a method that extends the notion of consensus trees to allow the visualization of conflicting hypotheses in a consensus network. We demonstrate the utility of this method in highlighting differences amongst maximum likelihood bootstrap values and Bayesian posterior probabilities in the placental mammal phylogeny, and also in comparing the phylogenetic signal contained in amino acid versus nucleotide characters for hexapod monophyly.
Publisher: Public Library of Science (PLoS)
Date: 03-06-2009
Publisher: Oxford University Press (OUP)
Date: 26-03-2016
DOI: 10.1093/GBE/EVW065
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Springer Science and Business Media LLC
Date: 17-05-2005
Abstract: Micro-biological research relies on the use of model organisms that act as representatives of their species or subspecies, these are frequently well-characterized laboratory strains. However, it has often become apparent that the model strain initially chosen does not represent important features of the species. For micro-organisms, the ersity of their genomes is such that even the best possible choice of initial strain for sequencing may not assure that the genome obtained adequately represents the species. To acquire information about a species' genome as efficiently as possible, we require a method to choose strains for analysis on the basis of how well they represent the species. We develop the Best Total Coverage (BTC) method for selecting one or more representative model organisms from a group of interest, given that rough genetic distances between the members of the group are known. Software implementing a "greedy" version of the method can be used with large data sets, its effectiveness is tested using both constructed and biological data sets. In both the simulated and biological ex les the greedy-BTC method outperformed random selection of model organisms, and for two biological ex les it outperformed selection of model strains based on phylogenetic structure. Although the method was designed with microbial species in mind, and is tested here on three microbial data sets, it will also be applicable to other types of organism.
Publisher: Public Library of Science (PLoS)
Date: 03-2016
Publisher: Springer Science and Business Media LLC
Date: 2008
Publisher: Elsevier
Date: 2017
Publisher: Springer Science and Business Media LLC
Date: 28-11-2020
DOI: 10.1007/S00239-019-09918-Z
Abstract: The underlying structure of the canonical amino acid substitution matrix (aaSM) is examined by considering stepwise improvements in the differential recognition of amino acids according to their chemical properties during the branching history of the two aminoacyl-tRNA synthetase (aaRS) superfamilies. The evolutionary expansion of the genetic code is described by a simple parameterization of the aaSM, in which (i) the number of distinguishable amino acid types, (ii) the matrix dimension and (iii) the number of parameters, each increases by one for each bifurcation in an aaRS phylogeny. Parameterized matrices corresponding to trees in which the size of an amino acid sidechain is the only discernible property behind its categorization as a substrate, exclusively for a Class I or II aaRS, provide a significantly better fit to empirically determined aaSM than trees with random bifurcation patterns. A second split between polar and nonpolar amino acids in each Class effects a vastly greater further improvement. The earliest Class-separated epochs in the phylogenies of the aaRS reflect these enzymes' capability to distinguish tRNAs through the recognition of acceptor stem identity elements via the minor (Class I) and major (Class II) helical grooves, which is how the ancient operational code functioned. The advent of tRNA recognition using the anticodon loop supports the evolution of the optimal map of amino acid chemistry found in the later genetic code, an essentially digital categorization, in which polarity is the major functional property, compensating for the unrefined, haphazard differentiation of amino acids achieved by the operational code.
Publisher: Oxford University Press (OUP)
Date: 17-11-2012
Abstract: We investigate distances on binary (presence/absence) data in the context of a Dollo process, where a trait can only arise once on a phylogenetic tree but may be lost many times. We introduce a novel distance, the Additive Dollo Distance (ADD), that applies to data generated under a Dollo model and show that it has some useful theoretical properties including an intriguing link to the LogDet aralinear distance. Simulations of Dollo data are used to compare a number of binary distances including ADD, LogDet, a restriction-site-based distance, and some simple, but to our knowledge previously unstudied, variations on common binary distances. The simulations suggest that ADD outperforms other distances on Dollo data. Interestingly, we found that the LogDet distance performs poorly in the context of a Dollo process this may have implications for its use in connection with conditioned genome reconstruction. We apply the ADD to two Diversity Arrays Technology data sets, one that broadly covers Eucalyptus species and one that focuses on the Eucalyptus series Adnataria. We also reanalyze gene family presence/absence data from bacterial genomes obtained from the COG database and compare the results with previous phylogenies estimated using the conditioned genome reconstruction approach. The results for these case studies are largely congruent with previous studies, in some cases giving more phylogenetic resolution.
Publisher: Oxford University Press (OUP)
Date: 08-11-2012
Abstract: In their 2008 and 2009 articles, Sumner and colleagues introduced the "squangles"-a small set of Markov invariants for phylogenetic quartets. The squangles are consistent with the general Markov (GM) model and can be used to infer quartets without the need to explicitly estimate all parameters. As the GM model is inhomogeneous and hence nonstationary, the squangles are expected to perform well compared with standard approaches when there are changes in base composition among species. However, the GM model assumes constant rates across sites, so the squangles should be confounded by data generated with invariant sites or other forms of rate-variation across sites. Here we implement the squangles in a least-squares setting that returns quartets weighted by either confidence or internal edge lengths, and we show how these weighted quartets can be used as input into a variety of supertree and supernetwork methods. For the first time, we quantitatively investigate the robustness of the squangles to breaking of the constant rates-across-sites assumption on both simulated and real data sets and we suggest a modification that improves the performance of the squangles in the presence of invariant sites. Our conclusion is that the squangles provide a novel tool for phylogenetic estimation that is complementary to methods that explicitly account for rate-variation across sites, but rely on homogeneous-and hence stationary-models.
Publisher: Springer Science and Business Media LLC
Date: 06-10-2011
DOI: 10.1007/S11538-011-9691-Z
Abstract: It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple ex le, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously ergent lineages a property that is of significant interest for biological applications.
Publisher: Oxford University Press (OUP)
Date: 27-07-2017
DOI: 10.1093/AOB/MCX086
Publisher: Wiley
Date: 09-2013
DOI: 10.1111/ANZS.12035
Start Date: 2015
End Date: 2017
Funder: Australian Research Council
View Funded ActivityStart Date: 2011
End Date: 2014
Funder: Australian Research Council
View Funded ActivityStart Date: 2018
End Date: 2020
Funder: Australian Research Council
View Funded ActivityStart Date: 2010
End Date: 2014
Funder: Australian Research Council
View Funded ActivityStart Date: 2016
End Date: 2018
Funder: Australian Research Council
View Funded ActivityStart Date: 07-2018
End Date: 09-2023
Amount: $317,329.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2016
End Date: 12-2019
Amount: $357,700.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2011
End Date: 2017
Amount: $532,376.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 03-2018
Amount: $310,700.00
Funder: Australian Research Council
View Funded ActivityStart Date: 12-2020
End Date: 12-2027
Amount: $35,000,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 01-2012
End Date: 01-2015
Amount: $730,000.00
Funder: Australian Research Council
View Funded Activity