ORCID Profile
0000-0002-1735-2630
Current Organisations
Wellcome Sanger Institute
,
University of Cambridge
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: IEEE
Date: 10-2016
DOI: 10.1109/SMC.2015.42
Publisher: Oxford University Press (OUP)
Date: 28-10-2018
DOI: 10.1093/BIOINFORMATICS/BTX691
Abstract: Targeted sequencing using capture probes has become increasingly popular in clinical applications due to its scalability and cost-effectiveness. The approach also allows for higher sequencing coverage of the targeted regions resulting in better analysis statistical power. However, because of the dynamics of the hybridization process, it is difficult to evaluate the efficiency of the probe design prior to the experiments which are time consuming and costly. We developed CapSim, a software package for simulation of targeted sequencing. Given a genome sequence and a set of probes, CapSim simulates the fragmentation, the dynamics of probe hybridization and the sequencing of the captured fragments on Illumina and PacBio sequencing platforms. The simulated data can be used for evaluating the performance of the analysis pipeline, as well as the efficiency of the probe design. Parameters of the various stages in the sequencing process can also be evaluated in order to optimize the experiments. CapSim is publicly available under BSD license at github.com/Devika1/capsim. Supplementary data are available at Bioinformatics online.
Publisher: University of Queensland Library
Date: 2019
Publisher: Cold Spring Harbor Laboratory
Date: 17-06-2019
DOI: 10.1101/673251
Abstract: Tandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between in iduals. However, population variation of TRs have not been widely explored due to the limitations of existing tools, which are either low-throughput or restricted to a small subset of TRs. Here, we used SureSelect targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X sequence coverage on 7 s les utilizing 2 MinION flow-cells with 200ng of input DNA per s le. We identified a subset of 110 TR loci with length less than 2kb, and GC content greater than 25% for which we achieved an average genotyping rate of 75% and increasing to 91% for the highest-coverage s le. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and moreover highly correlated with alleles estimated from whole genome long-read sequencing. We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more s les facilitating large-scale analysis of TRs.
Publisher: Cold Spring Harbor Laboratory
Date: 14-02-2017
DOI: 10.1101/108365
Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence (10%) and extensions(22%). These extensions are heritable and in some cases ergent between Asian and European s les. Six ninths of the extension sequence in NA12878 can be confirmed and filled by nanopore data. We identify two sequence families in these sequences which have undergone substantial duplication in multiple primate lineages. We show that these sequence families have arisen from progenitor interstitial sequence on the ancestral primate chromosome 7. Comparison of chromosome end sequences from 15 species revealed that chromosome end missing sequence matches the corresponding phylogenetic relationship and revealed a rate of chromosome extension per chromosome of 0.0020 bp per year in average.
Publisher: Cold Spring Harbor Laboratory
Date: 22-12-2020
DOI: 10.1101/2020.12.22.423893
Abstract: SARS-CoV-2 uses subgenomic (sg)RNA to produce viral proteins for replication and immune evasion. We applied long-read RNA and cDNA sequencing to in vitro human and primate infection models to study transcriptional dynamics. Transcription-regulating sequence (TRS)-dependent sgRNA was upregulated earlier in infection than TRS-independent sgRNA. An abundant class of TRS-independent sgRNA consisting of a portion of ORF1ab containing nsp1 joined to ORF10 and 3’UTR was upregulated at 48 hours post infection in human cell lines. We identified double-junction sgRNA containing both TRS-dependent and independent junctions. We found multiple sites at which the SARS-CoV-2 genome is consistently more modified than sgRNA, and that sgRNA modifications are stable across transcript clusters, host cells and time since infection. Our work highlights the dynamic nature of the SARS-CoV-2 transcriptome during its replication cycle. Our results are available via an interactive web-app at coinlab.mdhs.unimelb.edu.au/ .
Publisher: F1000 Research Ltd
Date: 05-09-2018
DOI: 10.12688/GATESOPENRES.12856.1
Abstract: Background: The chloroplast (cp) genome is an important resource for studying plant ersity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA. Methods: We constructed a complete circular cp genome assembly for the hexaploid sweetpotato using extremely low coverage ( ×) Oxford Nanopore whole-genome sequencing (WGS) data coupled with Illumina sequencing data for polishing. Results: The sweetpotato cp genome of 161,274 bp contains 152 genes, of which there are 96 protein coding genes, 8 rRNA genes and 48 tRNA genes. Using the cp genome assembly as a reference, we constructed complete cp genome assemblies for a further 17 sweetpotato cultivars from East Africa and an I. triloba line using Illumina WGS data. Analysis of the sweetpotato cp genomes demonstrated the presence of two distinct subpopulations in East Africa. Phylogenetic analysis of the cp genomes of the species from the Convolvulaceae Ipomoea section Batatas revealed that the most closely related diploid wild species of the hexaploid sweetpotato is I. trifida . Conclusions: Nanopore long reads are helpful in construction of cp genome assemblies, especially in solving the two long inverted repeats. We are generally able to extract cp sequences from WGS data of sufficiently high coverage for assembly of cp genomes. The cp genomes can be used to investigate the population structure and the phylogenetic relationship for the sweetpotato.
Publisher: Oxford University Press (OUP)
Date: 16-12-2022
DOI: 10.1093/BIOINFORMATICS/BTAC808
Abstract: We present YaHS, a user-friendly command-line tool for the construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at anger-tol/yahs. Supplementary data are available at Bioinformatics online.
Publisher: IEEE
Date: 06-2012
Publisher: Springer Science and Business Media LLC
Date: 02-11-2018
DOI: 10.1038/S41467-018-06983-8
Abstract: Sweetpotato [ Ipomoea batatas (L.) Lam.] is a globally important staple food crop, especially for sub-Saharan Africa. Agronomic improvement of sweetpotato has lagged behind other major food crops due to a lack of genomic and genetic resources and inherent challenges in breeding a heterozygous, clonally propagated polyploid. Here, we report the genome sequences of its two diploid relatives, I. trifida and I. triloba , and show that these high-quality genome assemblies are robust references for hexaploid sweetpotato. Comparative and phylogenetic analyses reveal insights into the ancient whole-genome triplication history of Ipomoea and evolutionary relationships within the Batatas complex. Using resequencing data from 16 genotypes widely used in African breeding programs, genes and alleles associated with carotenoid biosynthesis in storage roots are identified, which may enable efficient breeding of varieties with high provitamin A content. These resources will facilitate genome-enabled breeding in this important food security crop.
Publisher: Cold Spring Harbor Laboratory
Date: 22-03-2017
DOI: 10.1101/119271
Abstract: The assembly of whole-chromosome pseudomolecules for plant genomes remains challenging due to polyploidy and high repeat content. We developed an approach for constructing complete pseudomolecules for polyploid species using genotyping-by-sequencing data from outcrossing mapping populations coupled with high coverage whole genome sequence data of a reference genome. Our approach combines de novo assembly with linkage mapping to arrange scaffolds into pseudomolecules. We show that the method is able to reconstruct simulated chromosomes for both diploid and tetraploid genomes. Comparisons to three existing genetic mapping tools show that our method outperforms the other methods in accuracy on both grouping and ordering, and is robust to the presence of substantial amounts of missing data and genotyping errors. We applied our method to three real datasets including a diploid Ipomoea trifida and two tetraploid potato mapping populations. The linkage maps show significant concordance with the reference chromosomes. We resolved seven assembly errors for the published Ipomoea trifida genome assembly as well as anchored an unplaced scaffold in the published potato genome.
Publisher: F1000 Research Ltd
Date: 02-09-2020
DOI: 10.12688/F1000RESEARCH.25693.1
Abstract: Background: Tandem repeats (TRs) are highly prone to variation in copy numbers due to their repetitive and unstable nature, which makes them a major source of genomic variation between in iduals. However, population variation of TRs has not been widely explored due to the limitations of existing approaches, which are either low-throughput or restricted to a small subset of TRs. Here, we demonstrate a targeted sequencing approach combined with Nanopore sequencing to overcome these limitations. Methods: We selected 142 TR targets and enriched these regions using Agilent SureSelect target enrichment approach with only 200 ng of input DNA. We barcoded the enriched products and sequenced on Oxford Nanopore MinION sequencer. We used VNTRTyper and Tandem-genotypes to genotype TRs from long-read sequencing data. Gold standard PCR sizing analysis was used to validate genotyping results from targeted sequencing data. Results: We achieved an average of 3062-fold target enrichment on a panel of 142 TR loci, generating an average of 97X coverage per s le with 200 ng of input DNA per s le. We successfully genotyped an average of 75% targets and genotyping rate increased to 91% for the highest-coverage s le for targets with length less than 2 kb, and GC content greater than 25%. Alleles estimated from targeted long-read sequencing were concordant with gold standard PCR sizing analysis and highly correlated with alleles estimated from whole genome long-read sequencing. Conclusions: We demonstrate a targeted long-read sequencing approach that enables simultaneous analysis of hundreds of TRs and accuracy is comparable to PCR sizing analysis. Our approach is feasible to scale for more targets and more s les facilitating large-scale analysis of TRs.
Publisher: Cold Spring Harbor Laboratory
Date: 09-06-2022
DOI: 10.1101/2022.06.09.495093
Abstract: We present YaHS, a user-friendly command-line tool for construction of chromosome-scale scaffolds from Hi-C data. It can be run with a single-line command, requires minimal input from users (an assembly file and an alignment file) which is compatible with similar tools, and provides assembly results in multiple formats, thereby enabling rapid, robust and scalable construction of high-quality genome assemblies with high accuracy and contiguity. YaHS is implemented in C and licensed under the MIT License. The source code, documentation and tutorial are available at -zhou/yahs .
Publisher: Elsevier BV
Date: 2015
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 2017
Publisher: Springer Science and Business Media LLC
Date: 09-11-2018
DOI: 10.1038/S41598-018-34774-0
Abstract: The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. At least 11 BioNano assembled chromosome ends are structurally ergent from the reference genome, including both missing sequence and extensions. These extensions are heritable and in some cases ergent between Asian and European s les. Six out of nine predicted extension sequences from NA12878 can be confirmed and filled by nanopore data. We identify two multi-kilobase sequence families both enriched more than 100-fold in extension sequence (p-values 1e-5) whose origins can be traced to interstitial sequence on ancestral primate chromosome 7. Extensive sub-telomeric duplication of these families has occurred in the human lineage subsequent to ergence from chimpanzees.
Publisher: Elsevier BV
Date: 05-2021
Location: Australia
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
No related grants have been discovered for Chenxi Zhou.