ORCID Profile
0000-0002-7572-6354
Current Organisations
Nanjing Medical University
,
German Cancer Research Center
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Springer Science and Business Media LLC
Date: 04-2013
Publisher: Springer Science and Business Media LLC
Date: 09-10-2023
Publisher: Cold Spring Harbor Laboratory
Date: 26-03-2020
DOI: 10.1101/2020.03.25.008433
Abstract: Adult bone marrow harbors a mosaic of hematopoietic stem cell (HSC) clones of embryonic origin, and recent work suggests that such clones may have coherent lineage fates. To probe under physiological conditions whether HSC clones with different fates are transcriptionally distinct, we developed PolyloxExpress – a Cre recombinase-dependent DNA substrate for in situ barcoding that allows parallel readout of barcodes and transcriptomes in single cells. We describe differentiation-inactive, multilineage and lineage-restricted HSC clones, find that they reside in distinct regions of the transcriptional landscape of hematopoiesis, and identify corresponding gene signatures. All clone types contain proliferating HSCs, indicating that differentiation-inactive HSCs can undergo symmetric self-renewal. Our work establishes an approach for studying determinants of stem cell fate in vivo and provides molecular evidence for fate coherence of HSC clones.
Publisher: EMBO
Date: 07-2016
Abstract: Transcription initiated at alternative sites can produce mRNA isoforms with different 5'UTRs, which are potentially subjected to differential translational regulation. However, the prevalence of such isoform-specific translational control across mammalian genomes is currently unknown. By combining polysome profiling with high-throughput mRNA 5' end sequencing, we directly measured the translational status of mRNA isoforms with distinct start sites. Among 9,951 genes expressed in mouse fibroblasts, we identified 4,153 showed significant initiation at multiple sites, of which 745 genes exhibited significant isoform- ergent translation. Systematic analyses of the isoform-specific translation revealed that isoforms with longer 5'UTRs tended to translate less efficiently. Further investigation of cis-elements within 5'UTRs not only provided novel insights into the regulation by known sequence features, but also led to the discovery of novel regulatory sequence motifs. Quantitative models integrating all these features explained over half of the variance in the observed isoform- ergent translation. Overall, our study demonstrated the extensive translational regulation by usage of alternative transcription start sites and offered comprehensive understanding of translational regulation by erse sequence features embedded in 5'UTRs.
Publisher: Springer Science and Business Media LLC
Date: 18-05-2011
Publisher: Elsevier BV
Date: 10-2022
DOI: 10.1016/J.CELREP.2022.111553
Abstract: Tumor microenvironments (TMEs) require co-operation of innate and adaptive immune cells, which influence tumor progression and immunotherapy. Caspase-activated gasdermins facilitate tumor death and promote anti-tumor immunity. How pyroptosis in immune cells affects the TME remains unclear. TME expression of gasdermin D (GSDMD) is highly expressed in antigen-presenting cells (APCs) and correlates with immune checkpoint signatures. Through conditional deletion of GSDMD, we demonstrate that GSDMD in TME APCs restricts anti-tumor immunity during PD-L1 inhibition. Loss of GSDMD in APCs enhances interferon-stimulated genes (ISGs), thereby promoting CD8
Publisher: Elsevier BV
Date: 09-2017
Publisher: Elsevier BV
Date: 09-2020
Publisher: Elsevier BV
Date: 02-2023
Publisher: Springer Science and Business Media LLC
Date: 21-03-2018
DOI: 10.1038/S41467-018-03544-X
Abstract: Cleavage of transfer (t)RNA and ribosomal (r)RNA are critical and conserved steps of translational control for cells to overcome varied environmental stresses. However, enzymes that are responsible for this event have not been fully identified in high eukaryotes. Here, we report a mammalian tRNA/rRNA-targeting endoribonuclease: SLFN13, a member of the Schlafen family. Structural study reveals a unique pseudo-dimeric U-pillow-shaped architecture of the SLFN13 N′-domain that may cl base-paired RNAs. SLFN13 is able to digest tRNAs and rRNAs in vitro, and the endonucleolytic cleavage dissevers 11 nucleotides from the 3′-terminus of tRNA at the acceptor stem. The cytoplasmically localised SLFN13 inhibits protein synthesis in 293T cells. Moreover, SLFN13 restricts HIV replication in a nucleolytic activity-dependent manner. According to these observations, we term SLFN13 RNase S13. Our study provides insights into the modulation of translational machinery in high eukaryotes, and sheds light on the functional mechanisms of the Schlafen family.
Publisher: Frontiers Media SA
Date: 23-08-2022
Publisher: Oxford University Press (OUP)
Date: 14-11-2019
DOI: 10.1093/NAR/GKZ1038
Abstract: Gene expression is precisely controlled in a stage and cell-type-specific manner, largely through the interaction between cis-regulatory elements and their associated trans-acting factors. Where these components aggregate in promoters and enhancers, they are able to cooperate to modulate chromatin structure and support the engagement in long-range 3D superstructures that shape the dynamics of a cell's genomic architecture. Recently, the term ‘super-enhancer’ has been introduced to describe a hyper-active regulatory domain comprising a complex array of sequence elements that work together to control the key gene networks involved in cell identity. Here, we survey the unique characteristics of super-enhancers compared to other enhancer types and summarize the recent advances in our understanding of their biological role in gene regulation. In particular, we discuss their capacity to attract the formation of phase-separated condensates, and capacity to generate three-dimensional genome structures that precisely activate their target genes. We also propose a multi-stage transition model to explain the evolutionary pressure driving the development of super-enhancers in complex organisms, and highlight the potential for involvement in tumorigenesis. Finally, we discuss more broadly the role of super-enhancers in human health disorders and related potential in therapeutic interventions.
Publisher: Springer Science and Business Media LLC
Date: 20-05-2019
DOI: 10.1038/S41596-019-0163-5
Abstract: Fate mapping is a powerful genetic tool for linking stem or progenitor cells with their progeny, and hence for defining cell lineages in vivo. The resolution of fate mapping depends on the numbers of distinct markers that are introduced in the beginning into stem or progenitor cells ideally, numbers should be sufficiently large to allow the tracing of output from in idual cells. Highly erse genetic barcodes can serve this purpose. We recently developed an endogenous genetic barcoding system, termed Polylox. In Polylox, random DNA recombination can be induced by transient activity of Cre recombinase in a 2.1-kb-long artificial recombination substrate that has been introduced into a defined locus in mice (Rosa26
Publisher: Frontiers Media SA
Date: 18-10-2021
DOI: 10.3389/FMOLB.2021.727614
Abstract: Oocyte maturation is the foundation for developing healthy in iduals of mammals. Upon germinal vesicle breakdown, oocyte meiosis resumes and the synthesis of new transcripts ceases. To quantitatively profile the transcriptomic dynamics after meiotic resumption throughout the oocyte maturation, we generated transcriptome sequencing data with in idual mouse oocytes at three main developmental stages: germinal vesicle (GV), metaphase I (MI), and metaphase II (MII). When clustering the sequenced oocytes, results showed that isoform-level expression analysis outperformed gene-level analysis, indicating isoform expression provided extra information that was useful in distinguishing oocyte stages. Comparing transcriptomes of the oocytes at the GV stage and the MII stage, in addition to identification of differentially expressed genes (DEGs), we detected many differentially expressed transcripts (DETs), some of which came from genes that were not identified as DEGs. When breaking down the isoform-level changes into alternative RNA processing events, we found the main source of isoform composition changes was the alternative usage of polyadenylation sites. With detailed analysis focusing on the alternative usage of 3′-UTR isoforms, we identified, out of 3,810 tested genes, 512 (13.7%) exhibiting significant switches of 3′-UTR isoforms during the process of moues oocyte maturation. Altogether, our data and analyses suggest the importance of examining isoform abundance changes during oocyte maturation, and further investigation of the pervasive 3′-UTR isoform switches in the transition may deepen our understanding on the molecular mechanisms underlying mammalian early development.
Publisher: Elsevier BV
Date: 06-2011
DOI: 10.1016/J.BBRC.2011.05.005
Abstract: High-throughput RNA sequencing (RNA-seq) technology provides a revolutionary approach to studying splicing events de novo. However, identifying splice junctions with high sensitivity and specificity remains a challenge. In the present study, we proposed a new tool named SeqSaw to detect splice junctions with or without the canonical GT-AG splicing signal. SeqSaw was applied to two ENCODE RNA-seq datasets and also compared with two existing methods. It was shown that the proposed method obtained better results on finding novel splice junctions. Experiments also revealed that the current sequencing depth has not yet reached saturation to detect novel transcripts. Moreover, by comparing the number of supporting reads, we demonstrated that many un-annotated splicing events can be tissue specific.
Publisher: Springer Science and Business Media LLC
Date: 22-01-2021
Publisher: Springer Science and Business Media LLC
Date: 18-09-2017
Publisher: Elsevier BV
Date: 04-2013
DOI: 10.1016/J.GENE.2012.11.045
Abstract: Recent study revealed that most human genes have alternative splicing and can produce multiple isoforms of transcripts. Differences in the relative abundance of the isoforms of a gene can have significant biological consequences. Identifying genes that are differentially spliced between two groups of RNA-sequencing s les is an important basic task in the study of transcriptomes with next-generation sequencing technology. We use the negative binomial (NB) distribution to model sequencing reads on exons, and propose a NB-statistic to detect differentially spliced genes between two groups of s les by comparing read counts on all exons. The method opens a new exon-based approach instead of isoform-based approach for the task. It does not require information about isoform composition, nor need the estimation of isoform expression. Experiments on simulated data and real RNA-seq data of human kidney and liver s les illustrated the method's good performance and applicability. It can also detect previously unknown alternative splicing events, and highlight exons that are most likely differentially spliced between the compared s les. We developed an NB-statistic method that can detect differentially spliced genes between two groups of s les without using a prior knowledge on the annotation of alternative splicing. It does not need to infer isoform structure or to estimate isoform expression. It is a useful method designed for comparing two groups of RNA-seq s les. Besides identifying differentially spliced genes, the method can highlight on the exons that contribute the most to the differential splicing. We developed a software tool called DSGseq for the presented method available at bioinfo.au.tsinghua.edu.cn/software/DSGseq.
Publisher: Public Library of Science (PLoS)
Date: 27-04-2012
Publisher: Elsevier
Date: 2014
Publisher: Springer Science and Business Media LLC
Date: 16-08-2017
DOI: 10.1038/NATURE23653
Publisher: Elsevier BV
Date: 04-2014
Publisher: Elsevier BV
Date: 02-2020
Publisher: Oxford University Press (OUP)
Date: 24-10-2009
DOI: 10.1093/BIOINFORMATICS/BTP612
Abstract: Summary: High-throughput RNA sequencing (RNA-seq) is rapidly emerging as a major quantitative transcriptome profiling platform. Here, we present DEGseq, an R package to identify differentially expressed genes or isoforms for RNA-seq data from different s les. In this package, we integrated three existing methods, and introduced two novel methods based on MA-plot to detect and visualize gene expression difference. Availability: The R package and a quick-start vignette is available at bioinfo.au.tsinghua.edu.cn/software/degseq Contact: xwwang@tsinghua.edu.cn zhangxg@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 11-2009
DOI: 10.1016/J.JTBI.2009.07.022
Abstract: MicroRNAs (miRNAs) are important post-transcriptional regulators that repress gene expression by binding to the 3'UTRs of their target mRNAs. There are two main outcomes for the transcripts targeted by miRNAs: mRNA degradation and translational repression. It is still unclear what factors determine whether a target transcript is degraded or translationally repressed. In this study, we collected two classes of genes that are targeted by miR-1, miR-155, miR-16, miR-30a, and let-7b and built new computational models with machine-learning methods to predict the fates of target genes based on sequence features. The prediction results indicate that the sequence context of the miRNA binding site at the 3'UTR of a target gene plays an important role in determining how an miRNA regulates the expression of its target. Further analysis shows that four out of the five studied miRNAs probably share similar regulatory mechanisms on their target genes.
Publisher: The Royal Society of Chemistry
Date: 14-12-2020
DOI: 10.1039/9781788019958-00117
Abstract: Next-generation sequencing is a fast-developing field that accelerates the pace of functional genomics. In precision medicine, it enables quick and precise identification of causal mutations and dramatically improves clinical outcome. In this chapter, we will review the next-generation-sequencing-based technologies and strategies for detection of disease-associated mutations and identification of novel biomarkers that can be used in precision medicine. We will cover topics in detection of genomic mutations in protein coding regions and non-coding regulatory elements, detection of circulating tumor DNA, and studies of human and microbiome interactions, as well as applications of bioinformatics in biomarker detection and identification.
Publisher: Springer Science and Business Media LLC
Date: 04-12-2019
DOI: 10.1038/S41467-019-13520-8
Abstract: Pandemic influenza A virus (IAV) outbreaks occur when strains from animal reservoirs acquire the ability to infect and spread among humans. The molecular basis of this species barrier is incompletely understood. Here we combine metabolic pulse labeling and quantitative proteomics to monitor protein synthesis upon infection of human cells with a human- and a bird-adapted IAV strain and observe striking differences in viral protein synthesis. Most importantly, the matrix protein M1 is inefficiently produced by the bird-adapted strain. We show that impaired production of M1 from bird-adapted strains is caused by increased splicing of the M segment RNA to alternative isoforms. Strain-specific M segment splicing is controlled by the 3′ splice site and functionally important for permissive infection. In silico and biochemical evidence shows that avian-adapted M segments have evolved different conserved RNA structure features than human-adapted sequences. Thus, we identify M segment RNA splicing as a viral host range determinant.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 16-12-2022
DOI: 10.1126/SCIIMMUNOL.ABQ2061
Abstract: Emergency hematopoiesis is a concerted response aimed toward enhanced protection from infection, involving multiple cell types and developmental stages across the immune system. Despite its importance, the underlying molecular regulation remains poorly understood. The deubiquitinase USP22 regulates the levels of monoubiquitinated histone H2B (H2Bub1), which is associated with activation of interferon responses upon viral infection. Here, we show that in the absence of infection or inflammation, mice lacking Usp22 in all hematopoietic cells display profound systemic emergency hematopoiesis, evident by increased hematopoietic stem cell proliferation, myeloid bias, and extramedullary hematopoiesis. Functionally, loss of Usp22 results in elevated phagocytosis by neutrophilic granulocytes and enhanced innate protection against Listeria monocytogenes infection. At the molecular level, we found this state of emergency hematopoiesis associated with transcriptional signatures of myeloid priming, enhanced mitochondrial respiration, and innate and adaptive immunity and inflammation. Augmented expression of many inflammatory genes was linked to elevated locus-specific H2Bub1 levels. Collectively, these results demonstrate the existence of a tunable epigenetic state that promotes systemic emergency hematopoiesis in a cell-autonomous manner to enhance innate protection, identifying potential paths toward immune enhancement.
Publisher: EMBO
Date: 08-2015
DOI: 10.15252/MSB.156240
Abstract: Translational regulation is mediated through the interaction between diffusible trans-factors and cis-elements residing within mRNA transcripts. In contrast to extensively studied transcriptional regulation, cis-regulation on translation remains underexplored. Using deep sequencing-based transcriptome and polysome profiling, we globally profiled allele-specific translational efficiency for the first time in an F1 hybrid mouse. Out of 7,156 genes with reliable quantification of both alleles, we found 1,008 (14.1%) exhibiting significant allelic ergence in translational efficiency. Systematic analysis of sequence features of the genes with biased allelic translation revealed that local RNA secondary structure surrounding the start codon and proximal out-of-frame upstream AUGs could affect translational efficiency. Finally, we observed that the cis-effect was quantitatively comparable between transcriptional and translational regulation. Such effects in the two regulatory processes were more frequently compensatory, suggesting that the regulation at the two levels could be coordinated in maintaining robustness of protein expression.
Publisher: Frontiers Media SA
Date: 02-2023
DOI: 10.3389/FENDO.2023.1131256
Abstract: Well-controlled metabolism is the prerequisite for optimal oocyte development. To date, numerous studies have focused mainly on the utilization of exogenous substrates by oocytes, whereas the underlying mechanism of intrinsic regulation during meiotic maturation is less characterized. Herein, we performed an integrated analysis of parallel metabolomics and transcriptomics by isolating porcine oocytes at three time points, cooperatively depicting the global picture of the metabolic patterns during maturation. In particular, we identified the novel metabolic features during porcine oocyte meiosis, such as the fall in bile acids, the active one-carbon metabolism and a progressive decline in nucleotide metabolism. Collectively, the current study not only provides a comprehensive multiple omics data resource, but also may facilitate the discovery of molecular biomarkers that could be used to predict and improve oocyte quality.
Publisher: Springer Science and Business Media LLC
Date: 2011
Publisher: Oxford University Press (OUP)
Date: 31-01-2014
Publisher: American Society of Hematology
Date: 12-10-2023
Publisher: Springer Science and Business Media LLC
Date: 20-02-2019
Publisher: Elsevier BV
Date: 12-2017
Publisher: Elsevier BV
Date: 06-2020
Publisher: Springer Science and Business Media LLC
Date: 22-06-2020
DOI: 10.1038/S41592-020-0866-0
Abstract: We have developed CRISPR-assisted RNA-protein interaction detection method (CARPID), which leverages CRISPR-CasRx-based RNA targeting and proximity labeling to identify binding proteins of specific long non-coding RNAs (lncRNAs) in the native cellular context. We applied CARPID to the nuclear lncRNA XIST, and it captured a list of known interacting proteins and multiple previously uncharacterized binding proteins. We generalized CARPID to explore binders of the lncRNAs DANCR and MALAT1, revealing the method's wide applicability in identifying RNA-binding proteins.
Publisher: Oxford University Press (OUP)
Date: 17-12-2010
DOI: 10.1093/BIOINFORMATICS/BTQ696
Abstract: Motivation: RNA-Seq technology based on next-generation sequencing provides the unprecedented ability of studying transcriptomes at high resolution and accuracy, and the potential of measuring expression of multiple isoforms from the same gene at high precision. Solved by maximum likelihood estimation, isoform expression can be inferred in RNA-Seq using statistical models based on the assumption that sequenced reads are distributed uniformly along transcripts. Modification of the model is needed when considering situations where RNA-Seq data do not follow uniform distribution. Results: We proposed two curves, the global bias curve (GBC) and the local bias curves (LBCs), to describe the non-uniformity of read distributions for all genes in a transcriptome and for each gene, respectively. Incorporating the bias curves into the uniform read distribution (URD) model, we introduced non-URD (N-URD) models to infer isoform expression levels. On a series of systematic simulation studies, the proposed models outperform the original model in recovering major isoforms and the expression ratio of alternative isoforms. We also applied the new model to real RNA-Seq datasets and found that its inferences on expression ratios of alternative isoforms are more reasonable. The experiments indicate that incorporating N-URD information can improve the accuracy in modeling and inferring isoform expression in RNA-Seq. Contact: zhangxg@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 11-2019
DOI: 10.1038/S41467-019-13037-0
Abstract: Gene annotation is a critical resource in genomics research. Many computational approaches have been developed to assemble transcriptomes based on high-throughput short-read sequencing, however, only with limited accuracy. Here, we combine next-generation and third-generation sequencing to reconstruct a full-length transcriptome in the rat hippoc us, which is further validated using independent 5´ and 3´-end profiling approaches. In total, we detect 28,268 full-length transcripts (FLTs), covering 6,380 RefSeq genes and 849 unannotated loci. Based on these FLTs, we discover co-occurring alternative RNA processing events. Integrating with polysome profiling and ribosome footprinting data, we predict isoform-specific translational status and reconstruct an open reading frame (ORF)-eome. Notably, a high proportion of the predicted ORFs are validated by mass spectrometry-based proteomics. Moreover, we identify isoforms with subcellular localization pattern in neurons. Collectively, our data advance our knowledge of RNA and protein isoform ersity in the rat brain and provide a rich resource for functional studies.
Publisher: Elsevier BV
Date: 10-2016
DOI: 10.1016/J.CELL.2016.09.015
Abstract: Do young and old protein molecules have the same probability to be degraded? We addressed this question using metabolic pulse-chase labeling and quantitative mass spectrometry to obtain degradation profiles for thousands of proteins. We find that >10% of proteins are degraded non-exponentially. Specifically, proteins are less stable in the first few hours of their life and stabilize with age. Degradation profiles are conserved and similar in two cell types. Many non-exponentially degraded (NED) proteins are subunits of complexes that are produced in super-stoichiometric amounts relative to their exponentially degraded (ED) counterparts. Within complexes, NED proteins have larger interaction interfaces and assemble earlier than ED subunits. Amplifying genes encoding NED proteins increases their initial degradation. Consistently, decay profiles can predict protein level attenuation in aneuploid cells. Together, our data show that non-exponential degradation is common, conserved, and has important consequences for complex formation and regulation of protein abundance.
Publisher: Public Library of Science (PLoS)
Date: 06-02-2015
Publisher: PUBLISHED BY IMPERIAL COLLEGE PRESS AND DISTRIBUTED BY WORLD SCIENTIFIC PUBLISHING CO.
Date: 09-2007
Publisher: China Science Publishing & Media Ltd.
Date: 09-2010
Publisher: Cold Spring Harbor Laboratory
Date: 08-10-2018
DOI: 10.1101/438176
Abstract: A century ago, influenza A virus (IAV) infection caused the 1918 flu pandemic and killed an estimated 20-40 million people. Pandemic IAV outbreaks occur when strains from animal reservoirs acquire the ability to infect and spread among humans. The molecular details of this species barrier are incompletely understood. We combined metabolic pulse labeling and quantitative shotgun proteomics to globally monitor protein synthesis upon infection of human cells with a human-and a bird-adapted IAV strain. While production of host proteins was remarkably similar, we observed striking differences in the kinetics of viral protein synthesis over the course of infection. Most importantly, the matrix protein M1 was inefficiently produced by the bird-adapted strain at later stages. We show that impaired production of M1 from bird-adapted strains is caused by increased splicing of the M segment RNA to alternative isoforms. Experiments with reporter constructs and recombinant influenza viruses revealed that strain-specific M segment splicing is controlled by the 3’ splice site and functionally important for permissive infection. Independent in silico evidence shows that avian-adapted M segments have evolved different conserved RNA structure features than human-adapted sequences. Thus, our data identifies M segment RNA splicing as a viral determinant of host range.
Publisher: Oxford University Press (OUP)
Date: 17-02-2014
DOI: 10.1093/BIOINFORMATICS/BTU090
Abstract: Summary: SeqGSEA is an open-source Bioconductor package for the functional integration of differential expression and splicing analysis in RNA-Seq data. SeqGSEA implements an analysis pipeline, which first computes differential splicing and differential expression scores, followed by integrating them into a per-gene score that quantifies each gene’s association with a phenotype of interest, and finally executes gene set enrichment analysis in a cutoff-free manner to achieve biological insights. SeqGSEA accounts for biological variability and determines the statistical significance of gene pathways and networks using subject permutation, and thus requires at least five s les per group. Real applications show that SeqGSEA detects more biologically meaningful gene sets without biases toward long or highly expressed genes. SeqGSEA can be set up to run in parallel to reduce the analysis time. Availability and implementation: The SeqGSEA package with a vignette is available at ackages/release/bioc/html/SeqGSEA.html. Contact: Murray.Carins@newcastle.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 02-12-2022
DOI: 10.1186/S40779-022-00434-8
Abstract: The application of single-cell RNA sequencing (scRNA-seq) in biomedical research has advanced our understanding of the pathogenesis of disease and provided valuable insights into new diagnostic and therapeutic strategies. With the expansion of capacity for high-throughput scRNA-seq, including clinical s les, the analysis of these huge volumes of data has become a daunting prospect for researchers entering this field. Here, we review the workflow for typical scRNA-seq data analysis, covering raw data processing and quality control, basic data analysis applicable for almost all scRNA-seq data sets, and advanced data analysis that should be tailored to specific scientific questions. While summarizing the current methods for each analysis step, we also provide an online repository of software and wrapped-up scripts to support the implementation. Recommendations and caveats are pointed out for some specific analysis tasks and approaches. We hope this resource will be helpful to researchers engaging with scRNA-seq, in particular for emerging clinical applications.
Publisher: World Scientific Pub Co Pte Lt
Date: 12-2010
DOI: 10.1142/S0219720010005178
Abstract: Due to its unprecedented high-resolution and detailed information, RNA-seq technology based on next-generation high-throughput sequencing significantly boosts the ability to study transcriptomes. The estimation of genes' transcript abundance levels or gene expression levels has always been an important question in research on the transcriptional regulation and gene functions. On the basis of the concept of Reads Per Kilo-base per Million reads (RPKM), taking the union-intersection genes (UI-based) and summing up inferred isoform abundance (isoform-based) are the two current strategies to estimate gene expression levels, but produce different estimations. In this paper, we made the first attempt to compare the two strategies' performances through a series of simulation studies. Our results showed that the isoform-based method gives not only more accurate estimation but also has less uncertainty than the UI-based strategy. If taking into account the non-uniformity of read distribution, the isoform-based method can further reduce estimation errors. We applied both strategies to real RNA-seq datasets of technical replicates, and found that the isoform-based strategy also displays a better performance. For a more accurate estimation of gene expression levels from RNA-seq data, even if the abundance levels of isoforms are not of interest, it is still better to first infer the isoform abundance and sum them up to get the expression level of a gene as a whole.
Publisher: Elsevier BV
Date: 02-2023
Publisher: Royal Society of Chemistry (RSC)
Date: 2015
DOI: 10.1039/C4MB00711E
Abstract: Reference gene-based normalization of expression profiles secures consistent differential expression analysis between s les of different phenotypes or biological conditions, and facilitates comparison between experimental batches.
No related grants have been discovered for Xi Wang.