ARDC Research Link Australia

Publication

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

Abstract: Supplementary Table 1. S le Size of each participated study, by case-control status and genotype platform. Supplementary Table 2. Association between fourteen environmental factors and the risk of breast cancer. Supplementary Table 3. Interactions between genes and fourteen environmental factors.

Publication

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Publisher: Life Science Alliance, LLC

Date: 17-01-2019

DOI: 10.26508/LSA.201800175

Abstract: Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Publication

matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling

Publisher: Cold Spring Harbor Laboratory

Date: 06-08-2021

DOI: 10.1101/2022.08.05.502985

Abstract: Deriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow selection of null sets from a pool of possible items while controlling for multiple covariates however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows. To address this, we developed matchRanges , a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework.

Publication

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Publisher: Cold Spring Harbor Laboratory

Date: 28-07-2018

DOI: 10.1101/378539

Abstract: Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results are directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility (JCC) score, which provides a way to evaluate the reliability of transcript-level abundance estimates as well as the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that while most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Publication

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.22544801

Abstract: Quantile-Quantile plot (Q-Q plot) of the aMiSTi p-values for each set of the GxE interactions.

Publication

Observation weights to unlock bulk RNA-seq tools for zero inflation and single-cell applications

Publisher: Cold Spring Harbor Laboratory

Date: 18-01-2018

DOI: 10.1101/250126

Abstract: Dropout events in single-cell transcriptome sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial (ZINB) model, that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

Publication

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.22544801.V1

Abstract: Quantile-Quantile plot (Q-Q plot) of the aMiSTi p-values for each set of the GxE interactions.

Publication

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.22544798

Abstract: Funding and acknowledgements.

Publication

A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 08-04-2022

DOI: 10.1158/2767-9764.CRC-21-0119

Abstract: Genome-wide association studies (GWAS) have identified more than 200 susceptibility loci for breast cancer, but these variants explain less than a fifth of the disease risk. Although gene–environment interactions have been proposed to account for some of the remaining heritability, few studies have empirically assessed this. We obtained genotype and risk factor data from 46,060 cases and 47,929 controls of European ancestry from population-based studies within the Breast Cancer Association Consortium (BCAC). We built gene expression prediction models for 4,864 genes with a significant (P & 0.01) heritable component using the transcriptome and genotype data from the Genotype-Tissue Expression (GTEx) project. We leveraged predicted gene expression information to investigate the interactions between gene-centric genetic variation and 14 established risk factors in association with breast cancer risk, using a mixed-effects score test. After adjusting for number of tests using Bonferroni correction, no interaction remained statistically significant. The strongest interaction observed was between the predicted expression of the C13orf45 gene and age at first full-term pregnancy (PGXE = 4.44 × 10−6). In this transcriptome-informed genome-wide gene–environment interaction study of breast cancer, we found no strong support for the role of gene expression in modifying the associations between established risk factors and breast cancer risk. Our study suggests a limited role of gene–environment interactions in breast cancer risk.

Publication

CTCF: an R/bioconductor data package of human and mouse CTCF binding sites

Publisher: Oxford University Press (OUP)

Date: 2022

DOI: 10.1093/BIOADV/VBAC097

Abstract: CTCF (CCCTC-binding factor) is an 11-zinc-finger DNA binding protein which regulates much of the eukaryotic genome’s 3D structure and function. The ersity of CTCF binding motifs has led to a fragmented landscape of CTCF binding data. We collected position weight matrices of CTCF binding motifs and defined strand-oriented CTCF binding sites in the human and mouse genomes, including the recent Telomere to Telomere and mm39 assemblies. We included selected experimentally determined and predicted CTCF binding sites, such as CTCF-bound cis-regulatory elements from SCREEN ENCODE. We recommend filtering strategies for CTCF binding motifs and demonstrate that liftOver is a viable alternative to convert CTCF coordinates between assemblies. Our comprehensive data resource and usage recommendations can serve to harmonize and strengthen the reproducibility of genomic studies utilizing CTCF binding data. ackages/CTCF. Companion website: dozmorovlab.github.io/CTCF/ Code to reproduce the analyses: ozmorovlab/CTCF.dev. Supplementary data are available at Bioinformatics Advances online.

Publication

Tximeta: Reference sequence checksums for provenance identification in RNA-seq

Publisher: Public Library of Science (PLoS)

Date: 25-02-2020

DOI: 10.1371/JOURNAL.PCBI.1007664

Publication

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.C.6550751.V1

Abstract: Genome-wide association studies (GWAS) have identified more than 200 susceptibility loci for breast cancer, but these variants explain less than a fifth of the disease risk. Although gene–environment interactions have been proposed to account for some of the remaining heritability, few studies have empirically assessed this. We obtained genotype and risk factor data from 46,060 cases and 47,929 controls of European ancestry from population-based studies within the Breast Cancer Association Consortium (BCAC). We built gene expression prediction models for 4,864 genes with a significant ( i P /i 0.01) heritable component using the transcriptome and genotype data from the Genotype-Tissue Expression (GTEx) project. We leveraged predicted gene expression information to investigate the interactions between gene-centric genetic variation and 14 established risk factors in association with breast cancer risk, using a mixed-effects score test. After adjusting for number of tests using Bonferroni correction, no interaction remained statistically significant. The strongest interaction observed was between the predicted expression of the i C13orf45 /i gene and age at first full-term pregnancy (P sub GXE /sub = 4.44 × 10 sup −6 /sup ). In this transcriptome-informed genome-wide gene–environment interaction study of breast cancer, we found no strong support for the role of gene expression in modifying the associations between established risk factors and breast cancer risk. Our study suggests a limited role of gene–environment interactions in breast cancer risk. /

Publication

The tidyomics ecosystem: Enhancing omic data analyses

Publisher: Cold Spring Harbor Laboratory

Date: 13-09-2023

DOI: 10.1101/2023.09.10.557072

Publication

bootRanges: Flexible generation of null sets of genomic ranges for hypothesis testing

Publisher: Cold Spring Harbor Laboratory

Date: 05-09-2022

DOI: 10.1101/2022.09.02.506382

Abstract: bootRanges provides fast functions for generation of bootstrapped genomic ranges representing the null sets in enrichment analysis. We show that shuffling or permutation schemes may result in overly narrow test statistics null distributions, while creating new ranges sets with a block bootstrap preserves local genomic correlation structure and generates more reliable null distributions. It can also be used in more complex analyses, such as accessing correlations between cis-regulatory elements (CREs) and genes across cell types or providing optimized thresholds, e.g. log fold change (logFC) from differential analysis. The bootRanges functions are available in the R/Bioconductor package nullranges at ackages/nullranges .

Publication

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Publisher: F1000 Research Ltd

Date: 29-02-2016

DOI: 10.12688/F1000RESEARCH.7563.2

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport ) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Publication

Tximeta: reference sequence checksums for provenance identification in RNA-seq

Publisher: Cold Spring Harbor Laboratory

Date: 25-09-2019

DOI: 10.1101/777888

Abstract: Correct annotation metadata is critical for reproducible and accurate RNA-seq analysis. When files are shared publicly or among collaborators with incorrect or missing annotation metadata, it becomes difficult or impossible to reproduce bioinformatic analyses from raw data. It also makes it more difficult to locate the transcriptomic features, such as transcripts or genes, in their proper genomic context, which is necessary for overlapping expression data with other datasets. We provide a solution in the form of an R/Bioconductor package tximeta that performs numerous annotation and metadata gathering tasks automatically on behalf of users during the import of transcript quantification files. The correct reference transcriptome is identified via a hashed checksum stored in the quantification output, and key transcript databases are downloaded and cached locally. The computational paradigm of automatically adding annotation metadata based on reference sequence checksums can greatly facilitate genomic workflows, by helping to reduce overhead during bioinformatic analyses, preventing costly bioinformatic mistakes, and promoting computational reproducibility. The tximeta package is available at ackages/tximeta .

Publication

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Publisher: F1000 Research Ltd

Date: 30-12-2015

DOI: 10.12688/F1000RESEARCH.7563.1

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package ( tximport ) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Publication

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.C.6550751

Abstract: Genome-wide association studies (GWAS) have identified more than 200 susceptibility loci for breast cancer, but these variants explain less than a fifth of the disease risk. Although gene–environment interactions have been proposed to account for some of the remaining heritability, few studies have empirically assessed this. We obtained genotype and risk factor data from 46,060 cases and 47,929 controls of European ancestry from population-based studies within the Breast Cancer Association Consortium (BCAC). We built gene expression prediction models for 4,864 genes with a significant ( i P /i 0.01) heritable component using the transcriptome and genotype data from the Genotype-Tissue Expression (GTEx) project. We leveraged predicted gene expression information to investigate the interactions between gene-centric genetic variation and 14 established risk factors in association with breast cancer risk, using a mixed-effects score test. After adjusting for number of tests using Bonferroni correction, no interaction remained statistically significant. The strongest interaction observed was between the predicted expression of the i C13orf45 /i gene and age at first full-term pregnancy (P sub GXE /sub = 4.44 × 10 sup −6 /sup ). In this transcriptome-informed genome-wide gene–environment interaction study of breast cancer, we found no strong support for the role of gene expression in modifying the associations between established risk factors and breast cancer risk. Our study suggests a limited role of gene–environment interactions in breast cancer risk. /

Publication

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.22544798.V1

Abstract: Funding and acknowledgements.

Publication

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Publisher: American Association for Cancer Research (AACR)

Date: 04-04-2023

DOI: 10.1158/2767-9764.22544795.V1

Abstract: Supplementary Table 1. S le Size of each participated study, by case-control status and genotype platform. Supplementary Table 2. Association between fourteen environmental factors and the risk of breast cancer. Supplementary Table 3. Interactions between genes and fourteen environmental factors.

Michael Love

Researcher

Publications

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Observation weights to unlock bulk RNA-seq tools for zero inflation and single-cell applications

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

CTCF: an R/bioconductor data package of human and mouse CTCF binding sites

Tximeta: Reference sequence checksums for provenance identification in RNA-seq

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

The tidyomics ecosystem: Enhancing omic data analyses

bootRanges: Flexible generation of null sets of genomic ranges for hypothesis testing

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Tximeta: reference sequence checksums for provenance identification in RNA-seq

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Related Organisations

University Of North Carolina At Chapel Hill

Stanford University

Dana Farber Cancer Institute

Freie Universität Berlin

Harvard TH Chan School Of Public Health

Related Funding Activities

Michael Love

Researcher

Related Links

Publications

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Observation weights to unlock bulk RNA-seq tools for zero inflation and single-cell applications

Supplementary Figure 1 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

CTCF: an R/bioconductor data package of human and mouse CTCF binding sites

Tximeta: Reference sequence checksums for provenance identification in RNA-seq

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

The tidyomics ecosystem: Enhancing omic data analyses

bootRanges: Flexible generation of null sets of genomic ranges for hypothesis testing

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Tximeta: reference sequence checksums for provenance identification in RNA-seq

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Data from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Information from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Supplementary Tables 1-3 from A Genome-Wide Gene-Based Gene–Environment Interaction Study of Breast Cancer in More than 90,000 Women

Related Organisations

University Of North Carolina At Chapel Hill

Stanford University

Dana Farber Cancer Institute

Freie Universität Berlin

Harvard TH Chan School Of Public Health

Related Funding Activities

ARDC NEWSLETTER SIGNUP