ORCID Profile
0000-0002-9207-0385
Current Organisations
Beijing Institute of Technology
,
Monash University
,
Alfred Health
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Cold Spring Harbor Laboratory
Date: 11-03-2020
DOI: 10.1101/2020.03.09.984468
Abstract: Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an ex le, we combine 850 s les from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows. Recursive approach to generating a multi-scaled atlas. Top panel: The method integrates data from all cell types in the Stemformatics database, and shows clear ision of s les into global categories of stromal, pluripotent or blood (inset) cell types. Bottom panel: Integration of only the blood cell subsets provides a blood atlas. Projection of external s les (green) onto the blood atlas. S les are coloured by curated annotations derived from the original studies, and can be viewed at Stemformatics.org
Publisher: Oxford University Press (OUP)
Date: 06-2020
DOI: 10.1093/GIGASCIENCE/GIAA064
Abstract: Diseases are complex phenotypes often arising as an emergent property of a non-linear network of genetic and epigenetic interactions. To translate this resulting state into a causal relationship with a subset of regulatory features, many experiments deploy an array of laboratory assays from multiple modalities. Often, each of these resulting datasets is large, heterogeneous, and noisy. Thus, it is non-trivial to unify these complex datasets into an interpretable phenotype. Although recent methods address this problem with varying degrees of success, they are constrained by their scopes or limitations. Therefore, an important gap in the field is the lack of a universal data harmonizer with the capability to arbitrarily integrate multi-modal datasets. In this review, we perform a critical analysis of methods with the explicit aim of harmonizing data, as opposed to case-specific integration. This revealed that matrix factorization, latent variable analysis, and deep learning are potent strategies. Finally, we describe the properties of an ideal universal data harmonization framework. A sufficiently advanced universal harmonizer has major medical implications, such as (i) identifying dysregulated biological pathways responsible for a disease is a powerful diagnostic tool (2) investigating these pathways further allows the biological community to better understand a disease’s mechanisms and (3) precision medicine also benefits from developments in this area, particularly in the context of the growing field of selective epigenome editing, which can suppress or induce a desired phenotype.
Publisher: F1000 Research Limited
Date: 2018
Publisher: F1000 Research Limited
Date: 2018
Publisher: Monash University
Date: 2019
Publisher: F1000Research
Date: 2016
Publisher: Wiley
Date: 18-11-2013
Publisher: F1000 Research Limited
Date: 2020
Publisher: Zenodo
Date: 2021
Publisher: F1000 Research Limited
Date: 2021
Publisher: Cold Spring Harbor Laboratory
Date: 06-2023
DOI: 10.1101/2023.05.31.542682
Abstract: The emerging field of Genome-NLP (Natural Language Processing) aims to analyse biological sequence data using machine learning (ML), offering significant advancements in data-driven diagnostics. Three key challenges exist in Genome-NLP. First, long biomolecular sequences require “tokenisation” into smaller subunits, which is non-trivial since many biological “words” remain unknown. Second, ML methods are highly nuanced, reducing interoperability and usability. Third, comparing models and reproducing results are difficult due to the large volume and poor quality of biological data. To tackle these challenges, we developed the first automated Genome-NLP workflow that integrates feature engineering and ML techniques. The workflow is designed to be species and sequence agnostic. In this workflow: a) We introduce a new transformer-based model for genomes called genomicBERT , which empirically tokenises sequences while retaining biological context. This approach minimises manual preprocessing, reduces vocabulary sizes, and effectively handles out-of-vocabulary “words”. (b) We enable the comparison of ML model performance even in the absence of raw data. To facilitate widespread adoption and collaboration, we have made genomicBERT available as part of the publicly accessible conda package called genomeNLP . We have successfully demonstrated the application of genomeNLP on multiple case studies, showcasing its effectiveness in the field of Genome-NLP. We provide a comprehensive classification of genomic data tokenisation and representation approaches for ML applications along with their pros and cons. We infer k-mers directly from the data and handle out-of-vocabulary words. At the same time, we achieve a significantly reduced vocabulary size compared to the conventional k-mer approach reducing the computational complexity drastically. Our method is agnostic to species or biomolecule type as it is data-driven. We enable comparison of trained model performance without requiring original input data, metadata or hyperparameter settings. We present the first publicly available, high-level toolkit that infers the grammar of genomic data directly through artificial neural networks. Preprocessing, hyperparameter sweeps, cross validations, metrics and interactive visualisations are automated but can be adjusted by the user as needed.
Publisher: Zenodo
Date: 2022
Publisher: Oxford University Press (OUP)
Date: 08-11-2019
DOI: 10.1093/NAR/GKY1064
Publisher: F1000 Research Ltd
Date: 06-07-2021
DOI: 10.12688/F1000RESEARCH.53453.1
Abstract: Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is a growing field of research as it has strong potential to unlock information on previously hidden biological relationships leading to early diagnosis, prognosis and expedited treatments. Many tools for multi-omics data integration are being developed. However, these tools are often restricted to highly specific experimental designs, and types of omics data. While some general methods do exist, they require specific data formats and experimental conditions. A major limitation in the field is a lack of a single or multi-omics pipeline which can accept data in an unrefined, information-rich form pre-integration and subsequently generate output for further investigation. There is an increasing demand for a generic multi-omics pipeline to facilitate general-purpose data exploration and analysis of heterogeneous data. Therefore, we present our R multiomics pipeline as an easy to use and flexible pipeline that takes unrefined multi-omics data as input, s le information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated application of the pipeline on two separate COVID-19 case studies. We enabled limited checkpointing where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. A seamless integration with the mixOmics R package is achieved, as the R data object can be loaded and manipulated with mixOmics functions. Our pipeline can be installed as an R package or from the git repository, and is accompanied by detailed documentation with walkthroughs on two case studies. The pipeline is also available as Docker and Singularity containers.
Publisher: F1000 Research Ltd
Date: 02-08-2023
DOI: 10.12688/F1000RESEARCH.53453.2
Abstract: Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is of immense interest to researchers as it has potential to unlock previously hidden biomolecular relationships leading to early diagnosis, prognosis, and expedited treatments. Many tools for multi-omics data integration are developed. However, these tools are often restricted to highly specific experimental designs, types of omics data, and specific data formats. A major limitation of the field is the lack of a pipeline that can accept data in unrefined form to preserve maximum biology in an in idual dataset prior to integration. We fill this gap by developing a flexible, generic multi-omics pipeline called multiomics , to facilitate general-purpose data exploration and analysis of heterogeneous data. The pipeline takes unrefined multi-omics data as input, s le information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated its application on a sepsis case study. We enabled limited checkpointing functionality where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. Our pipeline can be installed as an R package or manually from the git repository, and is accompanied by detailed documentation with walkthroughs on three case studies.
Publisher: Springer Science and Business Media LLC
Date: 2020
DOI: 10.1038/S41556-019-0445-8
Abstract: Defining the ontogeny of the human adaptive immune system during embryogenesis has implications for understanding childhood diseases including leukaemias and autoimmune conditions. Using RAG1:GFP human pluripotent stem cell reporter lines, we examined human T-cell genesis from pluripotent-stem-cell-derived haematopoietic organoids. Under conditions favouring T-cell development, RAG1+ cells progressively upregulated a cohort of recognized T-cell-associated genes, arresting development at the CD4+CD8+ stage. Sort and re-culture experiments showed that early RAG1+ cells also possessed B-cell, myeloid and erythroid potential. Flow cytometry and single-cell-RNA-sequencing data showed that early RAG1+ cells co-expressed the endothelial/haematopoietic progenitor markers CD34, VECAD and CD90, whereas imaging studies identified RAG1+ cells within CD31+ endothelial structures that co-expressed SOX17+ or the endothelial marker CAV1. Collectively, these observations provide evidence for a wave of human T-cell development that originates directly from haemogenic endothelium via a RAG1+ intermediate with multilineage potential.
Publisher: Oxford University Press (OUP)
Date: 04-07-2016
DOI: 10.1002/STEM.2419
Abstract: Stromal support is critical for lung homeostasis and the maintenance of an effective epithelial barrier. Despite this, previous studies have found a positive association between the number of mesenchymal stromal cells (MSCs) isolated from the alveolar compartment and human lung diseases associated with epithelial dysfunction. We hypothesised that bronchoalveolar lavage derived MSCs (BAL-MSCs) are dysfunctional and distinct from resident lung tissue MSCs (LT-MSCs). In this study, we comprehensively interrogated the phenotype and transcriptome of human BAL-MSCs and LT-MSCs. We found that MSCs were rarely recoverable from the alveolar space in healthy humans, but could be readily isolated from lung transplant recipients by bronchoalveolar lavage. BAL-MSCs exhibited a CD90Hi, CD73Hi, CD45Neg, CD105Lo immunophenotype and were bipotent, lacking adipogenic potential. In contrast, MSCs were readily recoverable from healthy human lung tissue and were CD90Hi or Lo, CD73Hi, CD45Neg, CD105Int and had full tri-lineage potential. Transcriptional profiling of the two populations confirmed their status as bona fide MSCs and revealed a high degree of similarity between each other and the archetypal bone-marrow MSC. 105 genes were differentially expressed 76 of which were increased in BAL-MSCs including genes involved in fibroblast activation, extracellular matrix deposition and tissue remodelling. Finally, we found the fibroblast markers collagen 1A1 and α-smooth muscle actin were increased in BAL-MSCs. Our data suggests that in healthy humans, lung MSCs reside within the tissue, but in disease can differentiate to acquire a profibrotic phenotype and migrate from their in-tissue niche into the alveolar space.
Publisher: Public Library of Science (PLoS)
Date: 28-09-2020
Publisher: F1000 Research Limited
Date: 2019
Publisher: F1000 Research Limited
Date: 2019
Publisher: F1000 Research Limited
Date: 2020
Publisher: F1000 Research Limited
Date: 2018
Publisher: Springer Science and Business Media LLC
Date: 18-03-2023
DOI: 10.1038/S41467-023-37200-W
Abstract: Even in the setting of optimal resuscitation in high-income countries severe sepsis and septic shock have a mortality of 20–40%, with antibiotic resistance dramatically increasing this mortality risk. To develop a reference dataset enabling the identification of common bacterial targets for therapeutic intervention, we applied a standardized genomic, transcriptomic, proteomic and metabolomic technological framework to multiple clinical isolates of four sepsis-causing pathogens: Escherichia coli , Klebsiella pneumoniae species complex, Staphylococcus aureus and Streptococcus pyogenes . Exposure to human serum generated a sepsis molecular signature containing global increases in fatty acid and lipid biosynthesis and metabolism, consistent with cell envelope remodelling and nutrient adaptation for osmoprotection. In addition, acquisition of cholesterol was identified across the bacterial species. This detailed reference dataset has been established as an open resource to support discovery and translational research.
Publisher: EMBO
Date: 13-09-2022
Publisher: Royal Society of Chemistry (RSC)
Date: 2013
DOI: 10.1039/C3EE41139G
Publisher: EMBO
Date: 13-03-2023
Abstract: During development, the lymphatic vasculature forms as a second network derived chiefly from blood vessels. The transdifferentiation of embryonic venous endothelial cells (VECs) into lymphatic endothelial cells (LECs) is a key step in this process. Specification, differentiation and maintenance of LEC fate are all driven by the transcription factor Prox1, yet the downstream mechanisms remain to be elucidated. We here present a single‐cell transcriptomic atlas of lymphangiogenesis in zebrafish, revealing new markers and hallmarks of LEC differentiation over four developmental stages. We further profile single‐cell transcriptomic and chromatin accessibility changes in zygotic prox1a mutants that are undergoing a LEC‐VEC fate shift. Using maternal and zygotic prox1a rox1b mutants, we determine the earliest transcriptomic changes directed by Prox1 during LEC specification. This work altogether reveals new downstream targets and regulatory regions of the genome controlled by Prox1 and presents evidence that Prox1 specifies LEC fate primarily by limiting blood vascular and haematopoietic fate. This extensive single‐cell resource provides new mechanistic insights into the enigmatic role of Prox1 and the control of LEC differentiation in development.
Publisher: Royal Society of Chemistry (RSC)
Date: 2015
DOI: 10.1039/C4TA06070A
Abstract: The attractive intermolecular interactions between PIM-1 and polycyclic aromatic hydrocarbons were used to produce films with higher CO 2 /N 2 gas sorption selectivity and reduced ageing of permeability.
Publisher: MDPI AG
Date: 08-06-2021
DOI: 10.3390/NCRNA7020033
Abstract: Phenotypes are driven by regulated gene expression, which in turn are mediated by complex interactions between erse biological molecules. Protein–DNA interactions such as histone and transcription factor binding are well studied, along with RNA–RNA interactions in short RNA silencing of genes. In contrast, lncRNA-protein interaction (LPI) mechanisms are comparatively unknown, likely directed by the difficulties in studying LPI. However, LPI are emerging as key interactions in epigenetic mechanisms, playing a role in development and disease. Their importance is further highlighted by their conservation across kingdoms. Hence, interest in LPI research is increasing. We therefore review the current state of the art in lncRNA-protein interactions. We specifically surveyed recent computational methods and databases which researchers can exploit for LPI investigation. We discovered that algorithm development is heavily reliant on a few generic databases containing curated LPI information. Additionally, these databases house information at gene-level as opposed to transcript-level annotations. We show that early methods predict LPI using molecular docking, have limited scope and are slow, creating a data processing bottleneck. Recently, machine learning has become the strategy of choice in LPI prediction, likely due to the rapid growth in machine learning infrastructure and expertise. While many of these methods have notable limitations, machine learning is expected to be the basis of modern LPI prediction algorithms.
Publisher: PeerJ
Date: 24-03-2016
DOI: 10.7717/PEERJ.1845
Abstract: Mesenchymal stromal cells (MSC) are widely used for the study of mesenchymal tissue repair, and increasingly adopted for cell therapy, despite the lack of consensus on the identity of these cells. In part this is due to the lack of specificity of MSC markers. Distinguishing MSC from other stromal cells such as fibroblasts is particularly difficult using standard analysis of surface proteins, and there is an urgent need for improved classification approaches. Transcriptome profiling is commonly used to describe and compare different cell types however, efforts to identify specific markers of rare cellular subsets may be confounded by the small s le sizes of most studies. Consequently, it is difficult to derive reproducible, and therefore useful markers. We addressed the question of MSC classification with a large integrative analysis of many public MSC datasets. We derived a sparse classifier (The Rohart MSC test) that accurately distinguished MSC from non-MSC s les with % accuracy on an internal training set of 635 s les from 41 studies derived on 10 different microarray platforms. The classifier was validated on an external test set of 1,291 s les from 65 studies derived on 15 different platforms, with % accuracy. The genes that contribute to the MSC classifier formed a protein-interaction network that included known MSC markers. Further evidence of the relevance of this new MSC panel came from the high number of Mendelian disorders associated with mutations in more than 65% of the network. These result in mesenchymal defects, particularly impacting on skeletal growth and function. The Rohart MSC test is a simple in silico test that accurately discriminates MSC from fibroblasts, other adult stem rogenitor cell types or differentiated stromal cells. It has been implemented in the www.stemformatics.org resource, to assist researchers wishing to benchmark their own MSC datasets or data from the public domain. The code is available from the CRAN repository and all data used to generate the MSC test is available to download via the Gene Expression Omnibus or the Stemformatics resource.
Location: China
No related grants have been discovered for Tyrone Chen.