ORCID Profile
0000-0002-5403-7998
Current Organisation
Walter and Eliza Hall Institute of Medical Research
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Wiley
Date: 05-2005
Abstract: The relaxin-like peptide family consists of relaxin-1, relaxin-2, and relaxin-3 and the insulin-like peptides (INSL)-3, INSL4, INSL5, and INSL6 (human relaxin-2 is equivalent to relaxin-1 in other species). Evolution of this family has been contentious. We therefore sought to clarify the issue by performing phylogenetic analysis of all relaxin-like peptides from the genomic databases available. Surprisingly, the phylogeny, combined with previous biologic characterizations, suggest that although relaxin's original function was likely in the brain, its reproductive role was acquired just prior to the ergence of hibians. This phylogeny also illuminates inconsistencies in relaxin evolution in invertebrates, chickens, and cows.
Publisher: Springer Science and Business Media LLC
Date: 10-2002
Publisher: Proceedings of the National Academy of Sciences
Date: 17-04-2003
Abstract: In the visual system, differential gene expression underlies development of the anterior–posterior and dorsal–ventral axes. Here we present the results of a microarray screen to identify genes differentially expressed in the developing retina. We assayed gene expression in nasal (anterior), temporal (posterior), dorsal, and ventral embryonic mouse retina. We used a statistical method to estimate gene expression between different retina regions. Genes were clustered according to their expression pattern and were ranked within each cluster. We identified groups of genes expressed in gradients or with restricted patterns of expression as verified by in situ hybridization. A common theme for the identified genes is the differential expression in the dorsal-ventral axis. By analyzing gene expression patterns, we provide insight into the molecular organization of the developing retina.
Publisher: Springer Science and Business Media LLC
Date: 2004
DOI: 10.1007/S00239-003-2520-8
Abstract: We studied the substitution patterns in 7661 well-conserved human-mouse alignments corresponding to the intergenic regions of human chromosome 22. Alignments with a high average GC content tend to have a higher human GC content than mouse GC content, indicating a lack of stationarity. Segmenting the alignments into four groups of GC content and fitting the general reversible substitution model (REV) separately gave significantly better fits than the overall fit and the levels of fit are close to that expected under an REV model. In addition, most of the fitted rate matrices are not of the HKY type but are remarkably strand-symmetric, and we constructed a number of substitution matrices that should be useful for genomic DNA sequence alignment. We did not find obvious signs of temporal inhomogeneity in the substitution rates and concluded that the conserved intergenic regions in human chromosome 22 and mouse appear to have evolved from their common ancestors via a process that is approximately reversible and strand-symmetric, assuming site homogeneity and independence.
Publisher: Oxford University Press (OUP)
Date: 12-02-2004
DOI: 10.1093/BIOINFORMATICS/BTG410
Abstract: Motivation: The defining feature of oligonucleotide expression arrays is the use of several probes to assay each targeted transcript. This is a bonanza for the statistical geneticist, who can create probeset summaries with specific characteristics. There are now several methods available for summarizing probe level data from the popular Affymetrix GeneChips, but it is difficult to identify the best method for a given inquiry. Results: We have developed a graphical tool to evaluate summaries of Affymetrix probe level data. Plots and summary statistics offer a picture of how an expression measure performs in several important areas. This picture facilitates the comparison of competing expression measures and the selection of methods suitable for a specific investigation. The key is a benchmark data set consisting of a dilution study and a spike-in study. Because the truth is known for these data, we can identify statistical features of the data for which the expected outcome is known in advance. Those features highlighted in our suite of graphs are justified by questions of biological interest and motivated by the presence of appropriate data. Availability: In conjunction with the release of a graphics toolbox as part of the Bioconductor project (www.bioconductor.org), a webtool is available at affycomp.biostat.jhsph.edu. Supplemental material is available at www.biostat.jhsph.edu/~ririzarr apers/suppaffycomp.pdf
Publisher: Springer Science and Business Media LLC
Date: 08-2005
Abstract: This study assessed the possibility to build a prognosis predictor, based on microarray gene expression measures, in stage II and III colon cancer patients. Tumour (T) and non-neoplastic mucosa (NM) mRNA s les from 18 patients (nine with a recurrence, nine with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. The k-nearest neighbour method was used for prognosis prediction using T and NM gene expression measures. Six-fold cross-validation was applied to select the number of neighbours and the number of informative genes to include in the predictors. Based on this information, one T-based and one NM-based predictor were proposed and their accuracies were estimated by double cross-validation. In six-fold cross-validation, the lowest numbers of informative genes giving the lowest numbers of false predictions (two out of 18) were 30 and 70 with the T and NM gene expression measures, respectively. A 30-gene T-based predictor and a 70-gene NM-based predictor were then built, with estimated accuracies of 78 and 83%, respectively. This study suggests that one can build an accurate prognosis predictor for stage II and III colon cancer patients, based on gene expression measures, and one can use either tumour or non-neoplastic mucosa for this purpose.
Publisher: Cold Spring Harbor Laboratory
Date: 18-02-2018
DOI: 10.1101/267450
Abstract: New approaches to lineage tracking allow the study of cell differentiation over many generations of cells during development in multicellular organisms. Understanding the variability observed in these lineage trees requires new statistical methods. Whereas invariant cell lineages, such as that for the nematode Caenorhabditis elegans , can be described using a lineage map, defined as the fixed pattern of phenotypes overlaid onto the binary tree structure, the variability of cell lineages from higher organisms makes it impossible to draw a single lineage map. Here, we introduce lineage variability maps which describe the pattern of second-order variation throughout the lineage tree. These maps can be undirected graphs of the partial correlations between every lineal position or directed graphs showing the dynamics of bifurcated patterns in each subtree. By using the symmetry invariance of a binary tree to develop a generalized spectral analysis for cell lineages, we show how to infer these graphical models for lineages of any depth from s le sizes of only a few pedigrees. When tested on pedigrees from C. elegans expressing a marker for pharyngeal differentiation potential, the maps recover essential features of the known lineage map. When applied to highly-variable pedigrees monitoring cell size in T lymphocytes, the maps show how most of the phenotype is set by the founder naive T cell. Lineage variability maps thus elevate the concept of the lineage map to the population level, addressing questions about the potency and dynamics of cell lineages and providing a way to quantify the progressive restriction of cell fate with increasing depth in the tree. Multicellular organisms develop from a single fertilized egg by sequential cell isions. The progeny from these isions adopt different traits that are transmitted and modified through many generations. By tracking how cell traits change with each successive cell ision throughout the family, or lineage, tree, it has been possible to understand where and how these modifications are controlled at the single-cell level, thereby addressing questions about, for ex le, the developmental origin of tissues, the sources of differentiation in immune cells, or the relationship between primary tumors and metastases. Such lineages often show large variability, with apparently identical founder cells giving rise to different patterns of descendants. Fundamental scientific questions, such as about the range of possible cell types a cell can give rise to, are often about this variability. To characterize this variation, and thus understand the lineage at the population level, we introduce lineage variability maps. Using data from worm and mammalian cell lineages we show how these maps provide quantifiable answers to questions about any developing lineage, such as the potency of founder cells and the progressive restriction of cell fate at each stage in the tree.
Publisher: Cold Spring Harbor Laboratory
Date: 02-10-2020
DOI: 10.1101/2020.09.30.306795
Abstract: Cancers can vary greatly in their transcriptomes. In contrast to alterations in specific genes or pathways, differences in tumor cell total mRNA content have not been comprehensively assessed. Technical and analytical challenges have impeded examination of total mRNA expression at scale across cancers. To address this, we developed a model for quantifying tumor-specific total mRNA expression (TmS) from bulk sequencing data, which performs transcriptomic deconvolution while adjusting for mixed genomes. We used single-cell RNA sequencing data to demonstrate total mRNA expression as a feature of tumor phenotype. We estimated and validated TmS in 5,015 patients across 15 cancer types identifying significant inter-in idual variability. At a pan-cancer level, high TmS is associated with increased risk of disease progression and death. Cancer type-specific patterns of genetic alterations, intra-tumor genetic heterogeneity, as well as pan-cancer trends in metabolic dysregulation and hypoxia contribute to TmS. Taken together, our results suggest that measuring cell-type specific total mRNA expression offers a broader perspective of tracking cancer transcriptomes, which has important biological and clinical implications.
Publisher: EMBO
Date: 06-2020
Publisher: Research Square Platform LLC
Date: 16-02-2023
DOI: 10.21203/RS.3.RS-2140339/V1
Abstract: T cell receptor repertoires can be profiled using next generation sequencing (NGS) to measure and monitor adaptive dynamical changes in response to disease and other perturbations. Genomic DNA-based bulk sequencing is cost-effective but necessitates multiplex target lification using multiple primer pairs with highly variable lification efficiencies. Here, we utilize an equimolar primer mixture and propose a single statistical normalization step that efficiently corrects for lification bias post sequencing. Using s les analyzed by both our open protocol and a commercial solution, we show high concordance between bulk clonality metrics. This approach is an inexpensive and open-source alternative to commercial solutions.
Publisher: Elsevier BV
Date: 10-2021
Publisher: Proceedings of the National Academy of Sciences
Date: 13-10-2005
Abstract: Chronic microbial infections are associated with fibrotic and inflammatory reactions known as granulomas showing similarities to wound-healing and tissue repair processes. We have previously mapped three leishmaniasis susceptibility loci, designated lmr1 , - 2 , and - 3 , which exert their effect independently of T cell immune responses. Here, we show that the wound repair response is critically important for the rapid cure in murine cutaneous leishmaniasis caused by Leishmania major . Mice congenic for leishmaniasis resistance loci, which cured their lesions more rapidly than their susceptible parents, also expressed differentially genes involved in tissue repair, laid down more ordered collagen fibers, and healed punch biopsy wounds more rapidly. Fibroblast monolayers from these mice repaired in vitro wounds faster, and this process was accelerated by supernatants from infected macrophages. Because these effects are independent of T cell-mediated immunity, we conclude that the rate of wound healing is likely to be an important component of innate immunity involved in resistance to cutaneous leishmaniasis.
Publisher: Hindawi Limited
Date: 10-2004
DOI: 10.1017/S0016672304007086
Abstract: Selective genotyping concerns the genotyping of a portion of in iduals chosen on the basis of their phenotypic values. Often in iduals are selected for genotyping from the high and low extremes of the phenotypic distribution. This procedure yields savings in cost and time by decreasing the total number of in iduals genotyped. Previous work by Darvasi et al . (1993) has shown that the power to detect a QTL by genotyping 40–50% of a population is roughly equivalent to genotyping the entire s le. However, these power studies have not accounted for different strategies of analysing the data when phenotypes of in iduals in the middle are excluded, nor have they investigated the genome-wide type I error rate under these different strategies or different selection percentages. Further, these simulation studies have not considered markers over the entire genome. In this paper, we present simulation studies of power for the maximum likelihood approach to QTL mapping by Lander & Botstein (1989) in the context of selective genotyping. We calculate the power of selectively genotyping the in iduals from the middle of the phenotypic distribution when performing QTL mapping over the whole mouse genome.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543860.V1
Abstract: List of the tools and packages used in the manuscript
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543863.V1
Abstract: Supplementary data containing 25 Supp. figures that support the results of the paper.
Publisher: Proceedings of the National Academy of Sciences
Date: 30-07-2002
Abstract: There is a great difference in susceptibility to v- abl transgene-induced plasmacytoma between the BALB/cAn and the relatively resistant C57BL/6J mouse strains. We have used the Mapmaker/SURVIVOR algorithm to analyze genome-wide scans on over 800 transgenic F 2 hybrid mice, and have mapped at least six loci on chromosomes 2, 4, 11, 17, and 18 that modify tumor-related morbidity. As in human multiple myeloma, males were found to be more prone to plasmacytomagenesis. Different loci influence tumor susceptibility in male and female mice. Survival in females may be largely controlled by a pair of interacting loci on chromosomes 2 and 17.
Publisher: Springer Science and Business Media LLC
Date: 20-05-2005
DOI: 10.1007/S00439-005-1296-X
Abstract: Primary open-angle glaucoma (POAG) is one of the leading causes of blindness in the world. It is a clinically variable group of diseases with the majority of cases presenting as the late onset adult type. Several chromosomal loci have been implicated in disease aetiology, but causal mutations have only been identified in a small proportion of glaucoma. We have previously described a large six-generation Tasmanian family with POAG exhibiting genetic heterogeneity. In this family, approximately one third of affected in iduals presented with a glutamine-368-STOP (Q368STOP) mutation in the myocilin gene. We now use a Markov Chain Monte Carlo (MCMC) method to identify a second disease region in this family on the short arm of chromosome 3. This disease locus was initially mapped to the marker D3S1298 and a subsequent minimum disease region of 9 cM between markers D3S1298 and D3S1289 was identified through additional mapping. The region did not overlap with any previously described locus for POAG. Using a multiplicative relative risk model, we identified a positive association between this region and the Q368STOP mutation of myocilin on chromosome 1 in affected in iduals. These findings provide evidence of a new autosomal dominant glaucoma locus on the short arm of chromosome 3.
Publisher: Springer Science and Business Media LLC
Date: 2005
Publisher: American Chemical Society (ACS)
Date: 09-07-2004
DOI: 10.1021/AC049717L
Abstract: Extensive prefractionation is now considered to be a necessary prerequisite for the comprehensive analysis of complex proteomes where the dynamic range of protein abundances can vary from approximately 10(6) for cells to approximately 10(10) for tissues such as blood. Here, we describe a high-resolution 2D protein separation system that uses a continuous free-flow electrophoresis (FFE) device to fractionate complex protein mixtures by solution-phase isoelectric focusing (IEF) into 96 well-defined pools, each separated by approximately 0.02-0.10 pH unit depending on the gradient created, followed by rapid (approximately 6 min per analysis) reversed-phase high-performance liquid chromatography (RP-HPLC) of each FFE pool. Fractionated proteins are readily visualized in a virtual 2D format using software that plots protein loci, pI in the first dimension and relative hydrophobicity (i.e., RP-HPLC retention time) in the second dimension. By coupling a diode-array detector in line with a multiwavelength fluorescence detector, separated proteins can be monitored in the RP-HPLC eluent by both UV absorbance and intrinsic fluorescence simultaneously from a single experiment. Triplicate analyses of standard proteins using a pH 3-10 gradient conducted over a 3-day period revealed a high system reproducibility with a SD of 0.57 (0.05 pH unit) within the FFE pools and 0.003 (0.18 s) for protein retention times in the second-dimension RP-HPLC step. In addition, we demonstrate that the FFE-IEF/RP-HPLC separation strategy can also be applied to complex mixtures of low molecular weight compounds such as peptides. With the facile ability to measure the pH of the isoelectric focused pools, peptide pI values can be estimated and used to qualify peptide identifications made using either MS/MS sequencing approaches or pI discriminated peptide mass fingerprinting. The calculated peak capacity of this 2D liquid-based FFE-IEF/RP-HPLC system is 6720.
Publisher: Oxford University Press (OUP)
Date: 09-2019
DOI: 10.1093/GIGASCIENCE/GIZ106
Abstract: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework. Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across erse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in in idual cells. SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.
Publisher: Elsevier BV
Date: 12-2002
Publisher: Wiley
Date: 05-2005
Abstract: Currently, four relaxin peptide family receptors are known: LGR7 is the relaxin receptor, although it also interacts specifically with relaxin-3 LGR8 is the insulin-like factor 3 (INSL3) receptor and GPCR135 or the somatostatin- and angiotensin-like peptide receptor (SALPR) and GPCR142 are both specific relaxin-3 receptors. Because these receptors coevolved together with their relaxin ligands, phylogenetic analysis of these sequences can provide insight into peptide-receptor interactions and even predict interacting partners for INSL4, INSL5, and INSL6, the receptors for which are unknown.
Publisher: Research Square Platform LLC
Date: 20-04-2021
DOI: 10.21203/RS.3.RS-432815/V1
Abstract: Tissue-resident memory T cells (TRM) provide immune defence against local infection and can inhibit cancer progression. However, it is unclear to what extent chronic inflammation impacts TRM activation and how the immune pressure exerted by TRM affects developing tumours in humans. We performed deep profiling of lung cancers arising in never-smokers (NS) and ever-smokers (ES), finding evidence of enhanced TRM immunosurveillance in ES lung. Only tumours arising in ES patients underwent clonal immune escape, even when evaluating cancers with similar tumour mutational burden to NS patients, suggesting that the timing of immune pressure exerted by TRM is a critical factor in the evolution of tumour immune evasion. Tumours grown in T cell quiescent NS lungs displayed little evidence of immune evasion and had fewer neoantigens with low ersity, paradoxically making them amenable to treatment with agonist of the costimulatory molecule, ICOS. These data demonstrate local environmental insults enhance TRM immunosurveillance of human tissue, shape the evolution of tumour immunogenicity and that this interplay informs effective immunotherapeutic modalities.
Publisher: Oxford University Press (OUP)
Date: 11-08-2004
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543857.V1
Abstract: Table S2 is the list of studies identifying genes associated with Exh/Res programs. Table S3 is a list of signatures used for scoring tumour data and performing survival analysis
Publisher: American Association for Cancer Research (AACR)
Date: 19-08-2021
DOI: 10.1158/2326-6066.CIR-21-0137
Abstract: Immunotherapy success in colorectal cancer is mainly limited to patients whose tumors exhibit high microsatellite instability (MSI). However, there is variability in treatment outcomes within this group, which is in part driven by the frequency and characteristics of tumor-infiltrating immune cells. Indeed, the presence of specific infiltrating immune-cell subsets has been shown to correlate with immunotherapy response and is in many cases prognostic of treatment outcome. Tumor-infiltrating lymphocytes (TIL) can undergo distinct differentiation programs, acquiring features of tissue-residency or exhaustion, a process during which T cells upregulate inhibitory receptors, such as PD-1, and lose functionality. Although residency and exhaustion programs of CD8+ T cells are relatively well studied, these programs have only recently been appreciated in CD4+ T cells and remain largely unknown in tumor-infiltrating natural killer (NK) cells. In this study, we used single-cell RNA sequencing (RNA-seq) data to identify signatures of residency and exhaustion in colorectal cancer–infiltrating lymphocytes, including CD8+, CD4+, and NK cells. We then tested these signatures in independent single-cell data from tumor and normal tissue–infiltrating immune cells. Furthermore, we used versions of these signatures designed for bulk RNA-seq data to explore tumor-intrinsic mutations associated with residency and exhaustion from TCGA data. Finally, using two independent transcriptomic datasets from patients with colon adenocarcinoma, we showed that combinations of these signatures, in particular combinations of NK-cell activity signatures, together with tumor-associated signatures, such as TGFβ signaling, were associated with distinct survival outcomes in patients with colon adenocarcinoma.
Publisher: Cold Spring Harbor Laboratory
Date: 21-02-2017
DOI: 10.1101/110387
Abstract: The identification of genomic rearrangements, particularly in cancers, with high sensitivity and specificity using massively parallel sequencing remains a major challenge. Here, we describe the Genome Rearrangement IDentification Software Suite (GRIDSS), a high-speed structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph assembler. By combining assembly, split read and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line and patient tumour data, recently winning SV sub-challenge #5 of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods. GRIDSS identifies non-template sequence insertions, micro-homologies and large imperfect homologies, and supports multi-s le analysis. GRIDSS is freely available at github.com/PapenfussLab/gridss .
Publisher: Research Square Platform LLC
Date: 11-06-2021
DOI: 10.21203/RS.3.RS-600171/V1
Abstract: Cancers can vary greatly in their transcriptomes. In contrast to alterations in specific genes or pathways, differences in tumor cell total mRNA content have not been comprehensively assessed. Technical and analytical challenges have impeded examination of total mRNA expression at scale across cancers. To address this, we developed a model for quantifying tumor-specific total mRNA expression (TmS) from bulk sequencing data, which performs transcriptomic deconvolution while adjusting for mixed genomes. We used single-cell RNA sequencing data to demonstrate total mRNA expression as a feature of tumor phenotype. We estimated and validated TmS in 5,015 patients across 15 cancer types identifying significant inter-in idual variability. At a pan-cancer level, high TmS is associated with increased risk of disease progression and death. Cancer type-specific patterns of genetic alterations, intra-tumor genetic heterogeneity, as well as pan-cancer trends in metabolic dysregulation and hypoxia contribute to TmS. Taken together, our results suggest that measuring cell-type specific total mRNA expression offers a broader perspective of tracking cancer transcriptomes, which has important biological and clinical implications.
Publisher: Association for Research in Vision and Ophthalmology (ARVO)
Date: 04-2003
DOI: 10.1167/IOVS.02-0622
Abstract: To generate a profile of genes expressed in the retina, RPE, and choroid after laser treatment and to identify genes that may contribute to the beneficial effects of laser photocoagulation in the treatment of angiogenic retinal diseases. Argon laser irradiation was delivered to the left eye of normal C57BL/6J mice (n = 30), with the right eye serving as the control in each animal. Three days after laser treatment, mice were culled, eyes enucleated, and the retinas dissected and pooled into respective groups. The total RNA of replicate s les was extracted, and expression profiles were obtained by microarray analysis. Data comparisons between control and treated s les were performed and statistically analyzed. Data revealed that the expression of 265 known genes and expressed sequence tags (ESTs) changed after laser treatment. Of those, 25 were found to be upregulated. These genes represented a number of biological processes, including photoreceptor metabolism, synaptic function, structural proteins, and adhesion molecules. Thus angiotensin II type 2 receptor (Agtr2), a potential candidate in the inhibition of VEGF-induced angiogenesis, was upregulated, whereas potential modulators of endothelial cell function, permeability factors, and VEGF inducers, such as FGF-14, FGF-16, IL-1beta, calcitonin receptor-like receptor (CRLR), and plasminogen activator inhibitor-2 (PAI2), were downregulated. In this study, genes were identified that both explain and contribute to the beneficial effects of laser photocoagulation in the treatment of angiogenic retinal diseases. The molecular insights into the therapeutic effects of laser photocoagulation may provide a basis for future therapeutic strategies.
Publisher: Springer Science and Business Media LLC
Date: 17-08-2021
DOI: 10.1038/S41467-021-25210-5
Abstract: Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological s le replicates to quantify variance within and between batches and a workflow that uses these replicates to remove unwanted variation in a hierarchical manner (hRUV). We use this design to produce a dataset of more than 1000 human plasma s les run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our tools not only provide a strategy for large scale data normalisation, but also provides guidance on the design strategy for large omics studies.
Publisher: Rockefeller University Press
Date: 08-09-2003
DOI: 10.1084/JEM.20030085
Abstract: Antibodies capable of inhibiting the invasion of Plasmodium merozoites into erythrocytes are present in in iduals that are clinically immune to the malaria parasite. Those targeting the 19-kD COOH-terminal domain of the major merozoite surface protein (MSP)-119 are a major component of this inhibitory activity. However, it has been difficult to assess the overall relevance of such antibodies to antiparasite immunity. Here we use an allelic replacement approach to generate a rodent malaria parasite (Plasmodium berghei) that expresses a human malaria (Plasmodium falciparum) form of MSP-119. We show that mice made semi-immune to this parasite line generate high levels of merozoite inhibitory antibodies that are specific for P. falciparum MSP-119. Importantly, protection from homologous blood stage challenge in these mice correlated with levels of P. falciparum MSP-119–specific inhibitory antibodies, but not with titres of total MSP-119–specific immunoglobulins. We conclude that merozoite inhibitory antibodies generated in response to infection can play a significant role in suppressing parasitemia in vivo. This study provides a strong impetus for the development of blood stage vaccines designed to generate invasion inhibitory antibodies and offers a new animal model to trial P. falciparum MSP-119 vaccines.
Publisher: Springer Science and Business Media LLC
Date: 15-09-2022
DOI: 10.1038/S41587-022-01440-W
Abstract: Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-s les (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
Publisher: Springer Science and Business Media LLC
Date: 2005
Publisher: American Chemical Society (ACS)
Date: 03-10-2003
DOI: 10.1021/AC034616T
Abstract: A database of 5500 unique peptide tandem mass spectra acquired in an ion trap mass spectrometer was assembled for peptides derived from proteins digested with trypsin. Peptides were identified initially from their tandem mass spectra by the SEQUEST algorithm and subsequently validated manually. Two different statistical methods were used to identify sequence-dependent fragmentation patterns that could be used to improve fragmentation models incorporated into current peptide sequencing and database search algorithms. The currently accepted "mobile proton" model was expanded to derive a new classification scheme for peptide mass spectra, the "relative proton mobility" scale, which considers peptide ion charge state and amino acid composition to categorize peptide mass spectra into peptide ions containing "nonmobile", "partially mobile", or "mobile" protons. Quantitation of amide bond fragmentation, both N- and C-terminal to any given amino acid, as well as the positional effect of an amino acid in a peptide and peptide length on such fragmentation, has been determined. Peptide bond cleavage propensities, both positive (i.e., enhanced) and negative (i.e., suppressed), were determined and ranked in order of their cleavage preferences as primary, secondary, or tertiary cleavage effects. For ex le, primary positive cleavage effects were observed for Xaa-Pro and Asp-Xaa bond cleavage for mobile and nonmobile peptide ion categories, respectively. We also report specific pairwise interactions (e.g., Asn-Gly) that result in enhanced amide bond cleavages analogous to those observed in solution-phase chemistry. Peptides classified as nonmobile gave low or insignificant scores, below reported MS/MS score thresholds (cutoff filters), indicating that incorporation of the relative proton mobility scale classification would lead to improvements in current MS/MS scoring functions.
Publisher: Springer Science and Business Media LLC
Date: 15-05-2023
DOI: 10.1038/S41467-023-37822-0
Abstract: Spatial proteomics technologies have revealed an underappreciated link between the location of cells in tissue microenvironments and the underlying biology and clinical features, but there is significant lag in the development of downstream analysis methods and benchmarking tools. Here we present SPIAT (spatial image analysis of tissues), a spatial-platform agnostic toolkit with a suite of spatial analysis algorithms, and spaSim (spatial simulator), a simulator of tissue spatial data. SPIAT includes multiple colocalization, neighborhood and spatial heterogeneity metrics to characterize the spatial patterns of cells. Ten spatial metrics of SPIAT are benchmarked using simulated data generated with spaSim. We show how SPIAT can uncover cancer immune subtypes correlated with prognosis in cancer and characterize cell dysfunction in diabetes. Our results suggest SPIAT and spaSim as useful tools for quantifying spatial patterns, identifying and validating correlates of clinical outcomes and supporting method development.
Publisher: Springer Science and Business Media LLC
Date: 13-06-2022
DOI: 10.1038/S41587-022-01342-X
Abstract: Single-cell RNA sequencing studies have suggested that total mRNA content correlates with tumor phenotypes. Technical and analytical challenges, however, have so far impeded at-scale pan-cancer examination of total mRNA content. Here we present a method to quantify tumor-specific total mRNA expression (TmS) from bulk sequencing data, taking into account tumor transcript proportion, purity and ploidy, which are estimated through transcriptomic/genomic deconvolution. We estimate and validate TmS in 6,590 patient tumors across 15 cancer types, identifying significant inter-tumor variability. Across cancers, high TmS is associated with increased risk of disease progression and death. TmS is influenced by cancer-specific patterns of gene alteration and intra-tumor genetic heterogeneity as well as by pan-cancer trends in metabolic dysregulation. Taken together, our results indicate that measuring cell-type-specific total mRNA expression in tumor cells predicts tumor phenotypes and clinical outcomes.
Publisher: Cold Spring Harbor Laboratory
Date: 26-11-2017
DOI: 10.1101/225177
Abstract: Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the dropout process. We develop DECENT, a DE method for scRNA-seq data that explicitly models the dropout process and performs statistical analyses on the inferred pre-dropout counts. We demonstrate using simulated and real datasets the superior performance of DECENT compared to existing methods. DECENT does not require spike-in data, but spike-ins can be used to improve performance when available. The method is implemented in a publicly-available R package.
Publisher: Proceedings of the National Academy of Sciences
Date: 10-08-2004
Abstract: How olfactory sensory neurons converge on spatially invariant glomeruli in the olfactory bulb is largely unknown. In one model, olfactory sensory neurons interact with spatially restricted guidance cues in the bulb that orient and guide them to their target. Identifying differentially expressed molecules in the olfactory bulb has been extremely difficult, however, hindering a molecular analysis of convergence. Here, we describe several such genes that have been identified in a screen that compiled microarray data to create a three-dimensional model of gene expression within the mouse olfactory bulb. The expression patterns of these identified genes form the basis of a nascent spatial map of differential gene expression in the bulb.
Publisher: Springer Science and Business Media LLC
Date: 23-08-2021
DOI: 10.1038/S41590-021-01004-1
Abstract: Tissue-resident memory T (T
Publisher: Oxford University Press (OUP)
Date: 22-05-2019
DOI: 10.1093/NAR/GKZ433
Abstract: The Nanostring nCounter gene expression assay uses molecular barcodes and single molecule imaging to detect and count hundreds of unique transcripts in a single reaction. These counts need to be normalized to adjust for the amount of s le, variations in assay efficiency and other factors. Most users adopt the normalization approach described in the nSolver analysis software, which involves background correction based on the observed values of negative control probes, a within-s le normalization using the observed values of positive control probes and normalization across s les using reference (housekeeping) genes. Here we present a new normalization method, Removing Unwanted Variation-III (RUV-III), which makes vital use of technical replicates and suitable control genes. We also propose an approach using pseudo-replicates when technical replicates are not available. The effectiveness of RUV-III is illustrated on four different datasets. We also offer suggestions on the design and analysis of studies involving this technology.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543854
Abstract: An excel file with several sheets, representing supp. tables S4-S10 in the paper.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543857
Abstract: Table S2 is the list of studies identifying genes associated with Exh/Res programs. Table S3 is a list of signatures used for scoring tumour data and performing survival analysis
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.C.6550434.V1
Abstract: Abstract Immunotherapy success in colorectal cancer is mainly limited to patients whose tumors exhibit high microsatellite instability (MSI). However, there is variability in treatment outcomes within this group, which is in part driven by the frequency and characteristics of tumor-infiltrating immune cells. Indeed, the presence of specific infiltrating immune-cell subsets has been shown to correlate with immunotherapy response and is in many cases prognostic of treatment outcome. Tumor-infiltrating lymphocytes (TIL) can undergo distinct differentiation programs, acquiring features of tissue-residency or exhaustion, a process during which T cells upregulate inhibitory receptors, such as PD-1, and lose functionality. Although residency and exhaustion programs of CD8 sup + /sup T cells are relatively well studied, these programs have only recently been appreciated in CD4 sup + /sup T cells and remain largely unknown in tumor-infiltrating natural killer (NK) cells. In this study, we used single-cell RNA sequencing (RNA-seq) data to identify signatures of residency and exhaustion in colorectal cancer–infiltrating lymphocytes, including CD8 sup + /sup , CD4 sup + /sup , and NK cells. We then tested these signatures in independent single-cell data from tumor and normal tissue–infiltrating immune cells. Furthermore, we used versions of these signatures designed for bulk RNA-seq data to explore tumor-intrinsic mutations associated with residency and exhaustion from TCGA data. Finally, using two independent transcriptomic datasets from patients with colon adenocarcinoma, we showed that combinations of these signatures, in particular combinations of NK-cell activity signatures, together with tumor-associated signatures, such as TGFβ signaling, were associated with distinct survival outcomes in patients with colon adenocarcinoma. /
Publisher: Springer Science and Business Media LLC
Date: 26-07-2022
DOI: 10.1038/S41590-022-01273-4
Abstract: Tissue-resident memory T cells (T
Publisher: American Society for Microbiology
Date: 08-2002
DOI: 10.1128/IAI.70.8.4510-4522.2002
Abstract: About 2.5 million people die of Plasmodium falciparum malaria every year. Fatalities are associated with systemic and organ-specific inflammation initiated by a parasite toxin. Recent studies show that glycosylphosphatidylinositol (GPI) functions as the dominant parasite toxin in the context of infection. GPIs also serve as membrane anchors for several of the most important surface antigens of parasite invasive stages. GPI anchoring is a complex posttranslational modification produced through the coordinated action of a multicomponent biosynthetic pathway. Here we present eight new genes of P. falciparum selected for encoding homologs of proteins essential for GPI synthesis: PIG-A, PIG-B, PIG-M, PIG-O, GPI1, GPI8, GAA-1, and DPM1. We describe the experimentally verified mRNA and predicted amino acid sequences and in situ localization of the gene products to the parasite endoplasmic reticulum. Moreover, we show preliminary evidence for the PIG-L and PIG-C genes. The biosynthetic pathway of the malaria parasite GPI offers potential targets for drug development and may be useful for studying parasite cell biology and the molecular basis for the pathophysiology of parasitic diseases.
Publisher: Life Science Alliance, LLC
Date: 09-06-2020
Abstract: At least 200 single-nucleotide polymorphisms (SNPs) are associated with multiple sclerosis (MS) risk. A key function that could mediate SNP-encoded MS risk is their regulatory effects on gene expression. We performed microarrays using RNA extracted from purified immune cell types from 73 untreated MS cases and 97 healthy controls and then performed Cis expression quantitative trait loci mapping studies using additive linear models. We describe MS risk expression quantitative trait loci associations for 129 distinct genes. By extending these models to include an interaction term between genotype and phenotype, we identify MS risk SNPs with opposing effects on gene expression in cases compared with controls, namely, rs2256814 MYT1 in CD4 cells (q = 0.05) and rs12087340 RF00136 in monocyte cells (q = 0.04). The rs703842 SNP was also associated with a differential effect size on the expression of the METTL21B gene in CD8 cells of MS cases relative to controls (q = 0.03). Our study provides a detailed map of MS risk loci that function by regulating gene expression in cell types relevant to MS.
Publisher: Proceedings of the National Academy of Sciences
Date: 07-08-2023
Abstract: Cellular omics such as single-cell genomics, proteomics, and microbiomics allow the characterization of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to revealing markers of disease progression, such as cancer and pathogen infection. A dedicated statistical method for differential variability analysis is lacking for cellular omics data, and existing methods for differential composition analysis do not model some compositional data properties, suggesting there is room to improve model performance. Here, we introduce sccomp, a method for differential composition and variability analyses that jointly models data count distribution, compositionality, group-specific variability, and proportion mean–variability association, being aware of outliers. sccomp provides a comprehensive analysis framework that offers realistic data simulation and cross-study knowledge transfer. Here, we demonstrate that mean–variability association is ubiquitous across technologies, highlighting the inadequacy of the very popular Dirichlet-multinomial distribution. We show that sccomp accurately fits experimental data, significantly improving performance over state-of-the-art algorithms. Using sccomp, we identified differential constraints and composition in the microenvironment of primary breast cancer.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.C.6550434
Abstract: Abstract Immunotherapy success in colorectal cancer is mainly limited to patients whose tumors exhibit high microsatellite instability (MSI). However, there is variability in treatment outcomes within this group, which is in part driven by the frequency and characteristics of tumor-infiltrating immune cells. Indeed, the presence of specific infiltrating immune-cell subsets has been shown to correlate with immunotherapy response and is in many cases prognostic of treatment outcome. Tumor-infiltrating lymphocytes (TIL) can undergo distinct differentiation programs, acquiring features of tissue-residency or exhaustion, a process during which T cells upregulate inhibitory receptors, such as PD-1, and lose functionality. Although residency and exhaustion programs of CD8 sup + /sup T cells are relatively well studied, these programs have only recently been appreciated in CD4 sup + /sup T cells and remain largely unknown in tumor-infiltrating natural killer (NK) cells. In this study, we used single-cell RNA sequencing (RNA-seq) data to identify signatures of residency and exhaustion in colorectal cancer–infiltrating lymphocytes, including CD8 sup + /sup , CD4 sup + /sup , and NK cells. We then tested these signatures in independent single-cell data from tumor and normal tissue–infiltrating immune cells. Furthermore, we used versions of these signatures designed for bulk RNA-seq data to explore tumor-intrinsic mutations associated with residency and exhaustion from TCGA data. Finally, using two independent transcriptomic datasets from patients with colon adenocarcinoma, we showed that combinations of these signatures, in particular combinations of NK-cell activity signatures, together with tumor-associated signatures, such as TGFβ signaling, were associated with distinct survival outcomes in patients with colon adenocarcinoma. /
Publisher: Research Square Platform LLC
Date: 04-02-2021
DOI: 10.21203/RS.3.RS-156243/V1
Abstract: Liquid chromatography-mass spectrometry-based metabolomics studies are increasingly applied to large population cohorts, which run for several weeks or even years in data acquisition. This inevitably introduces unwanted intra- and inter-batch variations over time that can overshadow true biological signals and thus hinder potential biological discoveries. To date, normalisation approaches have struggled to mitigate the variability introduced by technical factors whilst preserving biological variance, especially for protracted acquisitions. Here, we propose a study design framework with an arrangement for embedding biological s le replicates to quantify variance within and between batches and a novel workflow that uses these replicates to remove unwanted variation in a hierarchical (hRUV) manner. We use this design to produce a dataset of more than 1,000 human plasma s les run over an extended period of time. We demonstrate significant improvement of hRUV over existing methods in preserving biological signals whilst removing unwanted variation for large scale metabolomics studies. Our novel tools not only provide a strategy for large scale data normalization, but also provides guidance on the design strategy for large omics studies.
Publisher: Oxford University Press (OUP)
Date: 12-02-2004
DOI: 10.1093/BIOINFORMATICS/BTH088
Abstract: Summary: Modern experimental techniques, as for ex le DNA microarrays, as a result usually produce a long list of genes, which are potentially interesting in the analyzed process. In order to gain biological understanding from this type of data, it is necessary to analyze the functional annotations of all genes in this list. The Gene-Ontology (GO) database provides a useful tool to annotate and analyze the functions of a large number of genes. Here, we introduce a tool that utilizes this information to obtain an understanding of which annotations are typical for the analyzed list of genes. This program automatically obtains the GO annotations from a database and generates statistics of which annotations are overrepresented in the analyzed list of genes. This results in a list of GO terms sorted by their specificity. Availability: Our program GOstat is accessible via the Internet at gostat.wehi.edu.au
Publisher: eLife Sciences Publications, Ltd
Date: 07-09-2020
DOI: 10.7554/ELIFE.59630
Abstract: Mass cytometry (CyTOF) is a technology that has revolutionised single-cell biology. By detecting over 40 proteins on millions of single cells, CyTOF allows the characterisation of cell subpopulations in unprecedented detail. However, most CyTOF studies require the integration of data from multiple CyTOF batches usually acquired on different days and possibly at different sites. To date, the integration of CyTOF datasets remains a challenge due to technical differences arising in multiple batches. To overcome this limitation, we developed an approach called CytofRUV for analysing multiple CyTOF batches, which includes an R-Shiny application with diagnostic plots. CytofRUV can correct for batch effects and integrate data from large numbers of patients and conditions across batches, to confidently compare cellular changes and correlate these with clinically relevant outcomes.
Publisher: eLife Sciences Publications, Ltd
Date: 09-2020
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543863
Abstract: Supplementary data containing 25 Supp. figures that support the results of the paper.
Publisher: Cold Spring Harbor Laboratory
Date: 02-11-2017
Abstract: The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multis le analysis.
Publisher: Elsevier BV
Date: 12-2003
DOI: 10.1016/S1046-2023(03)00155-5
Abstract: Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA s les or between the printed probes. This paper describes normalization methods based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. Print-tip loess normalization provides a well-tested general purpose normalization method which has given good results on a wide range of arrays. The method may be refined by using quality weights for in idual spots. The method is best combined with diagnostic plots of the data which display the spatial and intensity trends. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such as plate-order normalization or scale-normalization between the arrays may be undertaken. Composite normalization may be used when control spots are available which are known to be not differentially expressed. Variations on loess normalization include global loess normalization and two-dimensional normalization. Detailed commands are given to implement the normalization techniques using freely available software.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543860
Abstract: List of the tools and packages used in the manuscript
Publisher: Oxford University Press (OUP)
Date: 22-02-2019
DOI: 10.1093/NAR/GKZ107
Publisher: Cold Spring Harbor Laboratory
Date: 17-10-2018
DOI: 10.1101/445924
Abstract: Systematic variation in the methylation of cytosines at CpG sites plays a critical role in early development of humans and other mammals. Of particular interest are regions of differential methylation between parental alleles, as these often dictate monoallelic gene expression, resulting in parent of origin specific control of the embryonic transcriptome and subsequent development, in a phenomenon known as genomic imprinting. Using long-read nanopore sequencing we show that, with an average genomic coverage of approximately ten, it is possible to determine both the level of methylation of CpG sites and the haplotype from which each read arises. The long-read property is exploited to characterise, using novel methods, both methylation and haplotype for reads that have reduced basecalling precision compared to Sanger sequencing. We validate the analysis both through comparison of nanopore-derived methylation patterns with those from Reduced Representation Bisulfite Sequencing data and through comparison with previously reported data. Our analysis successfully identifies known imprinting control regions as well as some novel differentially methylated regions which, due to their proximity to hitherto unknown monoallelically expressed genes, may represent new imprinting control regions.
Publisher: Society for Neuroscience
Date: 24-03-2004
DOI: 10.1523/JNEUROSCI.5051-03.2004
Abstract: In an effort to understand the complexity of genomic responses within selectively vulnerable regions after experimental brain injury, we examined whether single apoptotic neurons from both the CA3 and dentate differed from those in an uninjured brain. The mRNA from in idual active caspase 3(+)/terminal deoxynucleotidyl transferase-mediated biotinylated UTP nick end labeling [TUNEL(–)] and active caspase 3(+)/TUNEL(+) pyramidal and granule neurons in brain-injured mice were lified and compared with those from nonlabeled neurons in uninjured brains. Gene analysis revealed that overall expression of mRNAs increased with activation of caspase 3 and decreased to below uninjured levels with TUNEL reactivity. Cell type specificity of the apoptotic response was observed with both regionally distinct expression of mRNAs and differences in those mRNAs that were maximally regulated. Immunohistochemical analysis for two of the most highly differentially expressed genes ( prion and Sos2 ) demonstrated a correlation between the observed differential gene expression after traumatic brain injury and corresponding protein translation.
Publisher: Oxford University Press (OUP)
Date: 15-02-2003
DOI: 10.1093/NAR/GNG015
Abstract: High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11-20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.
Publisher: Oxford University Press (OUP)
Date: 27-06-2022
DOI: 10.1093/NAR/GKAC486
Abstract: Normalization of single cell RNA-seq data remains a challenging task. The performance of different methods can vary greatly between datasets when unwanted factors and biology are associated. Most normalization methods also only remove the effects of unwanted variation for the cell embedding but not from gene-level data typically used for differential expression (DE) analysis to identify marker genes. We propose RUV-III-NB, a method that can be used to remove unwanted variation from both the cell embedding and gene-level counts. Using pseudo-replicates, RUV-III-NB explicitly takes into account potential association with biology when removing unwanted variation. The method can be used for both UMI or read counts and returns adjusted counts that can be used for downstream analyses such as clustering, DE and pseudotime analyses. Using published datasets with different technological platforms, kinds of biology and levels of association between biology and unwanted variation, we show that RUV-III-NB manages to remove library size and batch effects, strengthen biological signals, improve DE analyses, and lead to results exhibiting greater concordance with independent datasets of the same kind. The performance of RUV-III-NB is consistent and is not sensitive to the number of factors assumed to contribute to the unwanted variation.
Publisher: Springer Science and Business Media LLC
Date: 30-07-2020
DOI: 10.1038/S41467-020-17641-3
Abstract: Reproducible research is the bedrock of experimental science. To enable the deployment of large-scale proteomics, we assess the reproducibility of mass spectrometry (MS) over time and across instruments and develop computational methods for improving quantitative accuracy. We perform 1560 data independent acquisition (DIA)-MS runs of eight s les containing known proportions of ovarian and prostate cancer tissue and yeast, or control HEK293T cells. Replicates are run on six mass spectrometers operating continuously with varying maintenance schedules over four months, interspersed with ~5000 other runs. We utilise negative controls and replicates to remove unwanted variation and enhance biological signal, outperforming existing methods. We also design a method for reducing missing values. Integrating these computational modules into a pipeline (ProNorM), we mitigate variation among instruments over time and accurately predict tissue proportions. We demonstrate how to improve the quantitative analysis of large-scale DIA-MS data, providing a pathway toward clinical proteomics.
Publisher: MDPI AG
Date: 11-09-2023
Publisher: Cold Spring Harbor Laboratory
Date: 19-12-2019
DOI: 10.1101/2019.12.18.881870
Abstract: RNA-Seq allows the study of both gene expression changes and transcribed mutations, providing a highly effective way to gain insight into cancer biology. When planning the sequencing of a large cohort of s les, library size is a fundamental factor affecting both the overall cost and the quality of the results. While several studies analyse the effect that library size has on differential expression analyses, sensitivity analysis for variant detection has received far less attention. We simulated shallower sequencing depths by downs ling 45 AML s les that are part of the Leucegene project, which were originally sequenced at high depth. We compared the sensitivity of six methods of recovering validated mutations on the same s les. The methods compared are a combination of three popular callers (MuTect, VarScan, and VarDict) and two filtering strategies. We observed an incremental loss in sensitivity when simulating libraries of 80M, 50M, 40M, 30M and 20M fragments, with the largest loss detected with less than 30M fragments (below 90%). The sensitivity in recovering indels varied markedly between callers, with VarDict showing the highest sensitivity (60%). Single nucleotide variant sensitivity is relatively consistent across methods, apart from MuTect, whose default filters need adjustment when using RNA-Seq. We also analysed 136 RNA-Seq s les from the TCGA-LAML cohort, assessing the change in sensitivity between the initial libraries (average 59M fragments) and after downs ling to 40M fragments. When considering single nucleotide variants in recurrently mutated myeloid genes we found a comparable performance, with a 3% average loss in sensitivity using 40M fragments. Between 30M and 40M fragments are needed to recover 90%-95% of the initial variants on recurrently mutated myeloid genes. To extend this result to another cancer type, an exploration of the characteristics of its mutations and gene expression patterns is suggested.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 12-01-2017
Publisher: Cold Spring Harbor Laboratory
Date: 12-06-2003
DOI: 10.1101/GR.1048803
Abstract: DNA microarrays produced by deposition (or `spotting')of a single long oligonucleotide probe for each gene may be an attractive alternative to other types of arrays. We produced spotted oligonucleotide arrays using two large collections of ∼70-mer probes, and used these arrays to analyze gene expression in two dissimilar human RNA s les. These s les were also analyzed using arrays produced by in situ synthesis of sets of multiple short (25-mer)oligonucleotides for each gene (Affymetrix GeneChips). We compared expression measurements for 7344 genes that were represented in both long oligonucleotide probe collections and the in situ-synthesized 25-mer arrays. We found strong correlations ( r = 0.8–0.9)between relative gene expression measurements made with spotted long oligonucleotide probes and in situ-synthesized 25-mer probe sets. Spotted long oligonucleotide arrays were suitable for use with both un lified cDNA and lified RNA targets, and are a cost-effective alternative for many functional genomics applications. Most previously reported evaluations of microarray technologies have focused on expression measurements made on a relatively small number of genes. The approach described here involves far more gene expression measurements and provides a useful method for comparing existing and emerging techniques for genome-scale expression analysis.
Publisher: Springer Science and Business Media LLC
Date: 26-11-2019
DOI: 10.1038/S41467-019-13266-3
Abstract: The disproportionately high prevalence of male cancer is poorly understood. We tested for sex-disparity in the functional integrity of the major tumor suppressor p53 in sporadic cancers. Our bioinformatics analyses expose three novel levels of p53 impact on sex-disparity in 12 non-reproductive cancer types. First, TP53 mutation is more frequent in these cancers among US males than females, with poorest survival correlating with its mutation. Second, numerous X-linked genes are associated with p53, including vital genomic regulators. Males are at unique risk from alterations of their single copies of these genes. High expression of X-linked negative regulators of p53 in wild-type TP53 cancers corresponds with reduced survival. Third, females exhibit an exceptional incidence of non-expressed mutations among p53-associated X-linked genes. Our data indicate that poor survival in males is contributed by high frequencies of TP53 mutations and an inability to shield against deregulated X-linked genes that engage in p53 networks.
Publisher: Elsevier BV
Date: 11-2003
Publisher: Cold Spring Harbor Laboratory
Date: 04-2022
DOI: 10.1101/2022.03.30.486449
Abstract: T cell receptor (TCR) repertoires can be profiled using next generation sequencing (NGS) to monitor dynamical changes in response to disease and other perturbations. Several strategies for profiling TCRs have been recently developed with different benefits and drawbacks. Genomic DNA-based bulk sequencing, however, remains the most cost-effective method to profile TCRs. The major disadvantage of this method is the need for multiplex target lification with a large set of primer pairs with potentially very different lification efficiencies. One approach addressing this problem is by iteratively adjusting the concentrations of the primers based on their efficiencies, and then computationally correcting any remaining bias. Yet there are no standard, publicly available protocols to process and analyze raw sequencing data generated by this method. Here, we utilize an equimolar primer mixture and propose a single statistical normalization step that efficiently corrects for lification bias post sequencing. Using s les analyzed by both approaches, we show that the concordance between bulk clonality metrics obtained from using the commercial kits and that developed herein is high. Therefore, we suggest the method presented here as an inexpensive and non-commercial alternative for measuring and monitoring adaptive dynamics in TCR clonotype repertoire.
Publisher: American Association for Cancer Research (AACR)
Date: 04-04-2023
DOI: 10.1158/2326-6066.22543854.V1
Abstract: An excel file with several sheets, representing supp. tables S4-S10 in the paper.
Publisher: Cold Spring Harbor Laboratory
Date: 21-04-2021
DOI: 10.1101/2021.04.20.440373
Abstract: Tissue-resident memory T cells (T RM ) provide immune defence against local infection and can inhibit cancer progression. However, it is unclear to what extent chronic inflammation impacts T RM activation and how the immune pressure exerted by T RM affects developing tumours in humans. We performed deep profiling of lung cancers arising in never-smokers (NS) and ever-smokers (ES), finding evidence of enhanced T RM immunosurveillance in ES lung. Only tumours arising in ES patients underwent clonal immune escape, even when evaluating cancers with similar tumour mutational burden to NS patients, suggesting that the timing of immune pressure exerted by T RM is a critical factor in the evolution of tumour immune evasion. Tumours grown in T cell quiescent NS lungs displayed little evidence of immune evasion and had fewer neoantigens with low ersity, paradoxically making them amenable to treatment with agonist of the costimulatory molecule, ICOS. These data demonstrate local environmental insults enhance T RM immunosurveillance of human tissue, shape the evolution of tumour immunogenicity and that this interplay informs effective immunotherapeutic modalities.
Publisher: Springer Science and Business Media LLC
Date: 27-07-2020
DOI: 10.1038/S42003-020-1111-1
Abstract: Gene expression data obtained in large studies hold great promises for discovering disease signatures or subtypes through data analysis. It is also prone to technical variation, whose removal is essential to avoid spurious discoveries. Because this variation is not always known and can be confounded with biological signals, its removal is a challenging task. Here we provide a step-wise procedure and comprehensive analysis of the MINDACT microarray dataset. The MINDACT trial enrolled 6693 breast cancer patients and prospectively validated the gene expression signature MammaPrint for outcome prediction. The study also yielded a full-transcriptome microarray for each tumor. We show for the first time in such a large dataset how technical variation can be removed while retaining expected biological signals. Because of its unprecedented size, we hope the resulting adjusted dataset will be an invaluable tool to discover or test gene expression signatures and to advance our understanding of breast cancer.
Publisher: Public Library of Science (PLoS)
Date: 12-02-2019
Location: Australia
No related grants have been discovered for Terence Speed.