ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0002-0580-6324

Current Organisation
Australian National University

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Research Topics

In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.

ANZSRC Field of Research (FoR)

Mycology | Genomics and transcriptomics | Bioinformatics and computational biology

ANZSRC Socio-Economic Objective (SEO)

Publications

Publication

Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages

Publisher: Oxford University Press (OUP)

Date: 12-07-2014

DOI: 10.1093/SYSBIO/SYU036

Abstract: Molecular phylogenetic studies of homologous sequences of nucleotides often assume that the underlying evolutionary process was globally stationary, reversible, and homogeneous (SRH), and that a model of evolution with one or more site-specific and time-reversible rate matrices (e.g., the GTR rate matrix) is enough to accurately model the evolution of data over the whole tree. However, an increasing body of data suggests that evolution under these conditions is an exception, rather than the norm. To address this issue, several non-SRH models of molecular evolution have been proposed, but they either ignore heterogeneity in the substitution process across sites (HAS) or assume it can be modeled accurately using the distribution. As an alternative to these models of evolution, we introduce a family of mixture models that approximate HAS without the assumption of an underlying predefined statistical distribution. This family of mixture models is combined with non-SRH models of evolution that account for heterogeneity in the substitution process across lineages (HAL). We also present two algorithms for searching model space and identifying an optimal model of evolution that is less likely to over- or underparameterize the data. The performance of the two new algorithms was evaluated using alignments of nucleotides with 10 000 sites simulated under complex non-SRH conditions on a 25-tipped tree. The algorithms were found to be very successful, identifying the correct HAL model with a 75% success rate (the average success rate for assigning rate matrices to the tree's 48 edges was 99.25%) and, for the correct HAL model, identifying the correct HAS model with a 98% success rate. Finally, parameter estimates obtained under the correct HAL-HAS model were found to be accurate and precise. The merits of our new algorithms were illustrated with an analysis of 42 337 second codon sites extracted from a concatenation of 106 alignments of orthologous genes encoded by the nuclear genomes of Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, S. castellii, S. kluyveri, S. bayanus, and Candida albicans. Our results show that second codon sites in the ancestral genome of these species contained 49.1% invariable sites, 39.6% variable sites belonging to one rate category (V1), and 11.3% variable sites belonging to a second rate category (V2). The ancestral nucleotide content was found to differ markedly across these three sets of sites, and the evolutionary processes operating at the variable sites were found to be non-SRH and best modeled by a combination of eight edge-specific rate matrices (four for V1 and four for V2). The number of substitutions per site at the variable sites also differed markedly, with sites belonging to V1 evolving slower than those belonging to V2 along the lineages separating the seven species of Saccharomyces. Finally, sites belonging to V1 appeared to have ceased evolving along the lineages separating S. cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii, and S. bayanus, implying that they might have become so selectively constrained that they could be considered invariable sites in these species.

Publication

Mitochondrial DNA and trade data support multiple origins of Helicoverpa armigera (Lepidoptera, Noctuidae) in Brazil

Publisher: Springer Science and Business Media LLC

Date: 28-03-2017

DOI: 10.1038/SREP45302

Abstract: The Old World bollworm Helicoverpa armigera is now established in Brazil but efforts to identify incursion origin(s) and pathway(s) have met with limited success due to the patchiness of available data. Using international agricultural/horticultural commodity trade data and mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) and cytochrome b (Cyt b ) gene markers, we inferred the origins and incursion pathways into Brazil. We detected 20 mtDNA haplotypes from six Brazilian states, eight of which were new to our 97 global COI-Cyt b haplotype database. Direct sequence matches indicated five Brazilian haplotypes had Asian, African, and European origins. We identified 45 parsimoniously informative sites and multiple substitutions per site within the concatenated (945 bp) nucleotide dataset, implying that probabilistic phylogenetic analysis methods are needed. High ersity and signatures of uniquely shared haplotypes with erse localities combined with the trade data suggested multiple incursions and introduction origins in Brazil. Increasing agricultural/horticultural trade activities between the Old and New Worlds represents a significant biosecurity risk factor. Identifying pest origins will enable resistance profiling that reflects countries of origin to be included when developing a resistance management strategy, while identifying incursion pathways will improve biosecurity protocols and risk analysis at biosecurity hotspots including national ports.

Publication

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads

Publisher: Oxford University Press (OUP)

Date: 28-01-2012

DOI: 10.1093/BIOINFORMATICS/BTS061

Abstract: Summary: SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes & 30 s to align a million read pairs onto the human reference genome and is at least 7.5 and 20 times faster than BWA and Bowtie, respectively. For aligning reads with up to four mismatches, SOAP3 aligns slightly more reads than BWA and Bowtie this is because SOAP3, unlike BWA and Bowtie, is not heuristic-based and always reports all answers. Availability: SOAP3 is available at: www.cs.hku.hk/2bwt-tools/soap3 soap.genomics.org.cn/soap3.html. Contact: liruiqiang@gmail.com, twlam@cs.hku.hk

Publication

HaploJuice: Accurate haplotype assembly from a pool of sequences with known relative concentrations

Publisher: Cold Spring Harbor Laboratory

Date: 25-04-2018

DOI: 10.1101/307025

Abstract: Pooling techniques, where multiple sub-s les are mixed in a single s le, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. [1] proposed a pooling strategy without the use of barcodes. Three sub-s les were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.’s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-s les and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq [1], ShoRAH [2], SAVAGE [3], PredictHaplo [4] and QuRe [5]. Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. HaploJuice achieves high accuracy in haplotype reconstruction, making Ranjard et al.’s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.

Publication

Phylogenomics resolves the timing and pattern of insect evolution

Publisher: American Association for the Advancement of Science (AAAS)

Date: 07-11-2014

DOI: 10.1126/SCIENCE.1257570

Abstract: Insects are the most erse group of animals, with the largest number of species. However, many of the evolutionary relationships between insect species have been controversial and difficult to resolve. Misof et al. performed a phylogenomic analysis of protein-coding genes from all major insect orders and close relatives, resolving the placement of taxa. The authors used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve long-controversial subjects in insect phylogeny. Science , this issue p. 763

Publication

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Publisher: Cold Spring Harbor Laboratory

Date: 09-04-2021

DOI: 10.1101/2021.04.09.439138

Abstract: A current strategy for obtaining haplotype information from several in iduals involves short-read sequencing of pooled licons, where fragments from each in idual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled licons from a mixture of in iduals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian model of inference to estimates the phylogeny of the haplotypes and their relative frequencies, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and frequencies of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Publication

An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2012

DOI: 10.1109/TCBB.2012.104

Publication

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

Publisher: Public Library of Science (PLoS)

Date: 31-05-2013

DOI: 10.1371/JOURNAL.PONE.0065632

Publication

Effective Machine-Learning Assembly For Next-Generation Sequencing With Very Low Coverage

Publisher: Cold Spring Harbor Laboratory

Date: 16-08-2018

DOI: 10.1101/393116

Abstract: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. Here, we introduce a dynamic programming algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial licon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Our method allows us to assemble the first full mitochondrial genome for the western-grey kangaroo. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences.

Publication

Computational identification of protein binding sites on RNAs using high-throughput RNA structure-probing data

Publisher: Oxford University Press (OUP)

Date: 27-12-2013

DOI: 10.1093/BIOINFORMATICS/BTT757

Abstract: Motivation: High-throughput sequencing has been used to probe RNA structures, by treating RNAs with reagents that preferentially cleave or mark certain nucleotides according to their local structures, followed by sequencing of the resulting fragments. The data produced contain valuable information for studying various RNA properties. Results: We developed methods for statistically modeling these structure-probing data and extracting structural features from them. We show that the extracted features can be used to predict RNA ‘zipcodes’ in yeast, regions bound by the She complex in asymmetric localization. The prediction accuracy was better than using raw RNA probing data or sequence features. We further demonstrate the use of the extracted features in identifying binding sites of RNA binding proteins from whole-transcriptome global photoactivatable-ribonucleoside-enhanced cross-linking and immunopurification (gPAR-CLIP) data. Availability: The source code of our implemented methods is available at yiplab.cse.cuhk.edu.hk robrna/. Contact: kevinyip@cse.cuhk.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

RNASAlign: RNA Structural Alignment System

Publisher: Oxford University Press (OUP)

Date: 08-06-2011

DOI: 10.1093/BIOINFORMATICS/BTR338

Abstract: Motivation: Structural alignment of RNA is found to be a useful computational technique for idenitfying non-coding RNAs (ncRNAs). However, existing tools do not handle structures with pseudoknots. Although algorithms exist that can handle structural alignment for different types of pseudoknots, no software tools are available and users have to determine the type of pseudoknots to select the appropriate algoirthm to use which limits the usage of structural alignment in identifying novel ncRNAs. Results: We implemented the first web server, RNASAlign, which can automatically identify the pseudoknot type of a secondary structure and perform structural alignment of a folded RNA with every region of a target DNA/RNA sequence. Regions with high similarity scores and low e-values, together with the detailed alignments will be reported to the user. Experiments on more than 350 ncRNA families show that RNASAlign is effective. Availability: www.bio8.cs.hku.hk/RNASAlign. Contact: smyiu@cs.hku.hk

Publication

Structural Alignment of RNA with Triple Helix Structure

Publisher: Mary Ann Liebert Inc

Date: 04-2012

DOI: 10.1089/CMB.2010.0052

Abstract: Structural alignment is useful in identifying members of ncRNAs. Existing tools are all based on the secondary structures of the molecules. There is evidence showing that tertiary interactions (the interaction between a single-stranded nucleotide and a base-pair) in triple helix structures are critical in some functions of ncRNAs. In this article, we address the problem of structural alignment of RNAs with the triple helix. We provide a formal definition to capture a simplified model of a triple helix structure, then develop an algorithm of O(mn(3)) time to align a query sequence (of length m) with known triple helix structure with a target sequence (of length n) with an unknown structure. The resulting algorithm is shown to be useful in identifying ncRNA members in a simulated genome.

Publication

Memory efficient algorithms for structural alignment of RNAs with pseudoknots

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2012

DOI: 10.1109/TCBB.2011.66

Publication

HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations

Publisher: Springer Science and Business Media LLC

Date: 22-10-2018

DOI: 10.1186/S12859-018-2424-7

Publication

Promoter-sharing by different genes in human genome - CPNEI and RBMI2 gene pair as an example

Publisher: Springer Science and Business Media LLC

Date: 2008

DOI: 10.1186/1471-2164-9-456

Publication

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Publisher: Public Library of Science (PLoS)

Date: 13-09-2021

DOI: 10.1371/JOURNAL.PCBI.1008949

Abstract: A current strategy for obtaining haplotype information from several in iduals involves short-read sequencing of pooled licons, where fragments from each in idual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled licons from a mixture of in iduals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Publication

pgHMA: Application of the heteroduplex mobility assay analysis in phylogenetics and population genetics

Publisher: Wiley

Date: 05-10-2022

DOI: 10.1111/1755-0998.13508

Abstract: The heteroduplex mobility assay (HMA) has proven to be a robust tool for the detection of genetic variation. Here, we describe a simple and rapid application of the HMA by microfluidic capillary electrophoresis, for phylogenetics and population genetic analyses (pgHMA). We show how commonly applied techniques in phylogenetics and population genetics have equivalents with pgHMA: phylogenetic reconstruction with bootstrapping, skyline plots, and mismatch distribution analysis. We assess the performance and accuracy of pgHMA by comparing the results obtained against those obtained using standard methods of analyses applied to sequencing data. The resulting comparisons demonstrate that: (a) there is a significant linear relationship ( R 2 = .992) between heteroduplex mobility and genetic distance, (b) phylogenetic trees obtained by HMA and nucleotide sequences present nearly identical topologies, (c) clades with high pgHMA parametric bootstrap support also have high bootstrap support on nucleotide phylogenies, (d) skyline plots estimated from the UPGMA trees of HMA and Bayesian trees of nucleotide data reveal similar trends, especially for the median trend estimate of effective population size, and (e) optimized mismatch distributions of HMA are closely fitted to the mismatch distributions of nucleotide sequences. In summary, pgHMA is an easily‐applied method for approximating phylogenetic ersity and population trends.

Publication

Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification

Publisher: Public Library of Science (PLoS)

Date: 28-09-2010

DOI: 10.1371/JOURNAL.PONE.0012848

Publication

ModelFinder: fast model selection for accurate phylogenetic estimates

Publisher: Springer Science and Business Media LLC

Date: 08-05-2017

DOI: 10.1038/NMETH.4285

Publication

Refining orthologue groups at the transcript level

Publisher: Springer Science and Business Media LLC

Date: 12-2010

DOI: 10.1186/1471-2164-11-S4-S11

Abstract: Orthologues are genes in different species that are related through ergent evolution from a common ancestor and are expected to have similar functions. Many databases have been created to describe orthologous genes based on existing sequence data. However, alternative splicing (in eukaryotes) is usually disregarded in the determination of orthologue groups and the functional consequences of alternative splicing have not been considered. Most multi-exon genes can encode multiple protein isoforms which often have different functions and can be disease-related. Extending the definition of orthologue groups to take account of alternate splicing and the functional differences it causes requires further examination. A subset of the orthologous gene groups between human and mouse was selected from the InParanoid database for this study. Each orthologue group was ided into sub-clusters, at the transcript level, using a method based on the sequence similarity of the isoforms. Transcript based sub-clusters were verified by functional signatures of the cluster members in the InterPro database. Functional similarity was higher within than between transcript-based sub-clusters of a defined orthologous group. In certain cases, cancer-related isoforms of a gene could be distinguished from other isoforms of the gene. Predictions of intrinsic disorder in protein regions were also correlated with the isoform sub-clusters within an orthologue group. Sub-clustering of orthologue groups at the transcript level is an important step to more accurately define functionally equivalent orthologue groups. This work appears to be the first effort to refine orthologous groupings of genes based on the consequences of alternative splicing on function. Further investigation and refinement of the methodology to classify and verify isoform sub-clusters is needed, particularly to extend the technique to more distantly related species.

Related Organisations

Organisation

The University Of Hong Kong

Location: Hong Kong

View Organisation

Organisation

Australian National University

Location: Australia

View Organisation

Related Funding Activities

Grant

Discovery Projects - Grant ID: DP230100941

Start Date: 07-2023

End Date: 06-2026

Amount: $448,456.00

Funder: Australian Research Council

View Funded Activity

Thomas Wong

Researcher

Research Topics

Top 4 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Publications

Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages

Mitochondrial DNA and trade data support multiple origins of Helicoverpa armigera (Lepidoptera, Noctuidae) in Brazil

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads

HaploJuice: Accurate haplotype assembly from a pool of sequences with known relative concentrations

Phylogenomics resolves the timing and pattern of insect evolution

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

Effective Machine-Learning Assembly For Next-Generation Sequencing With Very Low Coverage

Computational identification of protein binding sites on RNAs using high-throughput RNA structure-probing data

RNASAlign: RNA Structural Alignment System

Structural Alignment of RNA with Triple Helix Structure

Memory efficient algorithms for structural alignment of RNAs with pseudoknots

HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations

Promoter-sharing by different genes in human genome - CPNEI and RBMI2 gene pair as an example

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

pgHMA: Application of the heteroduplex mobility assay analysis in phylogenetics and population genetics

Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification

ModelFinder: fast model selection for accurate phylogenetic estimates

Refining orthologue groups at the transcript level

Related Organisations

The University Of Hong Kong

Australian National University

Related Funding Activities

Discovery Projects - Grant ID: DP230100941

Thomas Wong

Researcher

Research Topics

Top 4 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Mixture Models of Nucleotide Sequence Evolution that Account for Heterogeneity in the Substitution Process Across Sites and Across Lineages

Mitochondrial DNA and trade data support multiple origins of Helicoverpa armigera (Lepidoptera, Noctuidae) in Brazil

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads

HaploJuice: Accurate haplotype assembly from a pool of sequences with known relative concentrations

Phylogenomics resolves the timing and pattern of insect evolution

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

An Efficient Alignment Algorithm for Searching Simple Pseudoknots over Long Genomic Sequence

SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner

Effective Machine-Learning Assembly For Next-Generation Sequencing With Very Low Coverage

Computational identification of protein binding sites on RNAs using high-throughput RNA structure-probing data

RNASAlign: RNA Structural Alignment System

Structural Alignment of RNA with Triple Helix Structure

Memory efficient algorithms for structural alignment of RNAs with pseudoknots

HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations

Promoter-sharing by different genes in human genome - CPNEI and RBMI2 gene pair as an example

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

pgHMA: Application of the heteroduplex mobility assay analysis in phylogenetics and population genetics

Adjacent Nucleotide Dependence in ncRNA and Order-1 SCFG for ncRNA Identification

ModelFinder: fast model selection for accurate phylogenetic estimates

Refining orthologue groups at the transcript level

Related Organisations

The University Of Hong Kong

Australian National University

Related Funding Activities

Discovery Projects - Grant ID: DP230100941

ARDC NEWSLETTER SIGNUP