ARDC Research Link Australia

Publication

Decision Support for Disability Employment using Counterfactual Survival Analysis

Publisher: IEEE

Date: 17-12-2022

DOI: 10.1109/BIGDATA55660.2022.10021126

Publication

Data-driven discovery of causal interactions

Publisher: Springer Science and Business Media LLC

Date: 12-01-2019

DOI: 10.1007/S41060-018-0168-0

Publication

Identifying Cancer Subtypes from miRNA-TF-mRNA Regulatory Networks and Expression Data

Publisher: Public Library of Science (PLoS)

Date: 04-2016

DOI: 10.1371/JOURNAL.PONE.0152792

Publication

Identifying preeclampsia-associated genes using a control theory method

Publisher: Oxford University Press (OUP)

Date: 28-04-2022

DOI: 10.1093/BFGP/ELAC006

Abstract: Preecl sia is a pregnancy-specific disease that can have serious effects on the health of both mothers and their offspring. Predicting which women will develop preecl sia in early pregnancy with high accuracy will allow for improved management. The clinical symptoms of preecl sia are well recognized, however, the precise molecular mechanisms leading to the disorder are poorly understood. This is compounded by the heterogeneous nature of preecl sia onset, timing and severity. Indeed a multitude of poorly defined causes including genetic components implicates etiologic factors, such as immune maladaptation, placental ischemia and increased oxidative stress. Large datasets generated by microarray and next-generation sequencing have enabled the comprehensive study of preecl sia at the molecular level. However, computational approaches to simultaneously analyze the preecl sia transcriptomic and network data and identify clinically relevant information are currently limited. In this paper, we proposed a control theory method to identify potential preecl sia-associated genes based on both transcriptomic and network data. First, we built a preecl sia gene regulatory network and analyzed its controllability. We then defined two types of critical preecl sia-associated genes that play important roles in the constructed preecl sia-specific network. Benchmarking against differential expression, betweenness centrality and hub analysis we demonstrated that the proposed method may offer novel insights compared with other standard approaches. Next, we investigated subtype specific genes for early and late onset preecl sia. This control theory approach could contribute to a further understanding of the molecular mechanisms contributing to preecl sia.

Publication

What is the Most Effective Intervention to Increase Job Retention for this Disabled Worker?

Publisher: ACM

Date: 14-08-2022

DOI: 10.1145/3534678.3539026

Publication

Identifying direct miRNA–mRNA causal regulatory relationships in heterogeneous data

Publisher: Elsevier BV

Date: 12-2014

DOI: 10.1016/J.JBI.2014.08.005

Abstract: Discovering the regulatory relationships between microRNAs (miRNAs) and mRNAs is an important problem that interests many biologists and medical researchers. A number of computational methods have been proposed to infer miRNA-mRNA regulatory relationships, and are mostly based on the statistical associations between miRNAs and mRNAs discovered in observational data. The miRNA-mRNA regulatory relationships identified by these methods can be both direct and indirect regulations. However, differentiating direct regulatory relationships from indirect ones is important for biologists in experimental designs. In this paper, we present a causal discovery based framework (called DirectTarget) to infer direct miRNA-mRNA causal regulatory relationships in heterogeneous data, including expression profiles of miRNAs and mRNAs, and miRNA target information. DirectTarget is applied to the Epithelial to Mesenchymal Transition (EMT) datasets. The validation by experimentally confirmed target databases suggests that the proposed method can effectively identify direct miRNA-mRNA regulatory relationships. To explore the upstream regulators of miRNA regulation, we further identify the causal feedforward patterns (CFFPs) of TF-miRNA-mRNA to provide insights into the miRNA regulation in EMT. DirectTarget has the potential to be applied to other datasets to elucidate the direct miRNA-mRNA causal regulatory relationships and to explore the regulatory patterns.

Publication

Identifying miRNA synergism using multiple-intervention causal inference

Publisher: Springer Science and Business Media LLC

Date: 12-2019

DOI: 10.1186/S12859-019-3215-5

Abstract: Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they in idually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. n this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.

Publication

Ancestral Instrument Method for Causal Inference without Complete Knowledge

Publisher: International Joint Conferences on Artificial Intelligence Organization

Date: 07-2022

DOI: 10.24963/IJCAI.2022/671

Abstract: Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement on a standard IV is strict and untestable. Conditional IVs have been proposed to relax the requirement of standard IVs by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it challenging to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) for causal inference with latent variables, we study the graphical properties of ancestral IVs, a type of conditional IVs using MAGs, and develop the theory to support data-driven discovery of the conditioning set for a given ancestral IV in data under the pretreatment variable assumption. Based on the theory, we develop an algorithm for unbiased causal effect estimation with a given ancestral IV and observational data. Extensive experiments on synthetic and real-world datasets demonstrate the performance of the algorithm in comparison with existing IV methods.

Publication

Recommending Personalized Interventions to Increase Employability of Disabled Jobseekers

Publisher: Springer International Publishing

Date: 2022

DOI: 10.1007/978-3-031-05981-0_8

Publication

Mining Causal Association Rules

Publisher: IEEE

Date: 12-2013

DOI: 10.1109/ICDMW.2013.88

Publication

miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships

Publisher: Public Library of Science (PLoS)

Date: 30-12-2015

DOI: 10.1371/JOURNAL.PONE.0145386

Publication

Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data

Publisher: Life Science Alliance, LLC

Date: 24-09-2020

DOI: 10.26508/LSA.202000867

Abstract: Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used erse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.

Publication

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample

Publisher: Hogrefe Publishing Group

Date: 04-2016

DOI: 10.1027/1015-5759/A000242

Abstract: Abstract. Studies on the construct validity of the Self-Description Questionnaire II (SDQII) have not compared the factor structure between the English and Chinese versions of the SDQII. By using rigorous multiple group comparison procedures based upon confirmatory factor analysis (CFA) of measurement invariance, the present study examined the responses of Australian high school students (N = 302) and Chinese high school students (N = 322) using the English and Chinese versions of the SDQII, respectively. CFA provided strong evidence that the factor structure (factor loading and item intercept) of the Chinese version of the SDQII in comparison to responses to the English version of the SDQII is invariant, therefore it allows researchers to confidently utilize both the English and Chinese versions of the SDQII with Chinese and Australian s les separately and cross-culturally.

Publication

PAN: Personalized Annotation-Based Networks for the Prediction of Breast Cancer Relapse

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2021

DOI: 10.1109/TCBB.2021.3076422

Publication

Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey

Publisher: arXiv

Date: 2022

DOI: 10.48550/ARXIV.2208.09590

Publication

Personalized Interventions to Increase the Employment Success of People With Disability

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TBDATA.2023.3291547

Publication

The winning methods for predicting cellular position in the DREAM single-cell transcriptomics challenge

Publisher: Oxford University Press (OUP)

Date: 25-08-2021

DOI: 10.1093/BIB/BBAA181

Abstract: Predicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM challenge on single-cell transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single-cell transcriptomic data. We have developed over 50 pipelines by combining different ways of preprocessing the RNA-seq data, selecting the genes, predicting the cell locations and validating predicted cell locations, resulting in the winning methods which were ranked second in sub-challenge 1, first in sub-challenge 2 and third in sub-challenge 3. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web application to facilitate the research on single-cell spatial reconstruction. All the data and the ex le use cases are available in the Supplementary data.

Publication

Uncovering the roles of microRNAs/lncRNAs in characterising breast cancer subtypes and prognosis

Publisher: Springer Science and Business Media LLC

Date: 04-06-2021

DOI: 10.1186/S12859-021-04215-3

Abstract: Accurate prognosis and identification of cancer subtypes at molecular level are important steps towards effective and personalised treatments of breast cancer. To this end, many computational methods have been developed to use gene (mRNA) expression data for breast cancer subtyping and prognosis. Meanwhile, microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been extensively studied in the last 2 decades and their associations with breast cancer subtypes and prognosis have been evidenced. However, it is not clear whether using miRNA and/or lncRNA expression data helps improve the performance of gene expression based subtyping and prognosis methods, and this raises challenges as to how and when to use these data and methods in practice. In this paper, we conduct a comparative study of 35 methods, including 12 breast cancer subtyping methods and 23 breast cancer prognosis methods, on a collection of 19 independent breast cancer datasets. We aim to uncover the roles of miRNAs and lncRNAs in breast cancer subtyping and prognosis from the systematic comparison. In addition, we created an R package, CancerSubtypesPrognosis, including all the 35 methods to facilitate the reproducibility of the methods and streamline the evaluation. The experimental results show that integrating miRNA expression data helps improve the performance of the mRNA-based cancer subtyping methods. However, miRNA signatures are not as good as mRNA signatures for breast cancer prognosis. In general, lncRNA expression data does not help improve the mRNA-based methods in both cancer subtyping and cancer prognosis. These results suggest that the prognostic roles of miRNA/lncRNA signatures in the improvement of breast cancer prognosis needs to be further verified.

Publication

Data Mining

Publisher: Springer Singapore

Date: 2019

DOI: 10.1007/978-981-15-1699-3

Publication

A novel framework for inferring condition-specific TF and miRNA co-regulation of protein–protein interactions

Publisher: Elsevier BV

Date: 02-2016

DOI: 10.1016/J.GENE.2015.11.023

Abstract: Recent studies have shown that transcription factors (TFs) and microRNAs (miRNAs), while independently regulate their downstream targets, collaborate with each other to regulate gene expression. However, their synergistic roles in protein-protein interactions (PPIs) remain mostly unknown. In this paper, we present a novel framework (called CoRePPI) for inferring TF and miRNA co-regulation of PPIs. Particularly, CoRePPI is aimed at discovering the co-regulation specific to a condition of interest, by using heterogeneous data, including miRNA and messenger RNA (mRNA) expression profiles, putative miRNA targets, TF targets and PPIs. CoRePPI firstly finds the network motifs indicating the co-regulation of PPIs by TFs and miRNAs in tumor and normal conditions separately. Then by identifying the differential motifs found in one condition but not in the other, it builds the networks consisting of TFs, miRNAs and their co-regulated PPIs specific to different conditions respectively. To validate CoRePPI, we apply it to the Pan-Cancer dataset which includes the expression profiles of 12 cancer types from TCGA. Through network topology analysis, we found that the tumor and normal CoRePPI networks are scale-free. Furthermore, the results of differential and intersected network analysis between the tumor and normal CoRePPI networks suggest that only a small fraction of the regulatory relationships between TFs and miRNAs are conserved in both conditions but they co-regulate different downstream PPIs in tumor and normal conditions and in different conditions the majority of the regulatory relationships between TFs and miRNAs are different although they may regulate the same PPIs in their respective conditions. The CoRePPI sub-networks constructed for the three types of cancers (breast cancer, lung cancer and ovarian cancer) are all scale-free, and the intersection of these CoRePPI sub-networks can be utilized as the biomarker CoRePPI sub-network of the three types of cancers. The PPI enrichment analyses of the tumor and normal CoRePPI networks suggest that the co-regulating TFs and miRNAs are significantly associated with the specific biological processes, diseases and pathways. In addition, comparing with the two non-condition-specific approaches, the tumor CoRePPI network is found to have the most enriched cancer-related PPIs. Altogether, the results uncover the combined regulatory patterns of TFs and miRNAs on the PPIs, and may provide new insights for research in cancer-associated TFs and miRNAs.

Publication

How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?

Publisher: Wiley

Date: 12-04-2022

DOI: 10.1111/BJET.13217

Abstract: With the widespread use of learning analytics (LA), ethical concerns about fairness have been raised. Research shows that LA models may be biased against students of certain demographic subgroups. Although fairness has gained significant attention in the broader machine learning (ML) community in the last decade, it is only recently that attention has been paid to fairness in LA. Furthermore, the decision on which unfairness mitigation algorithm or metric to use in a particular context remains largely unknown. On this premise, we performed a comparative evaluation of some selected unfairness mitigation algorithms regarded in the fair ML community to have shown promising results. Using a 3‐year program dropout data from an Australian university, we comparatively evaluated how the unfairness mitigation algorithms contribute to ethical LA by testing for some hypotheses across fairness and performance metrics. Interestingly, our results show how data bias does not always necessarily result in predictive bias. Perhaps not surprisingly, our test for fairness‐utility tradeoff shows how ensuring fairness does not always lead to drop in utility. Indeed, our results show that ensuring fairness might lead to enhanced utility under specific circumstances. Our findings may to some extent, guide fairness algorithm and metric selection for a given context. What is already known about this topic LA is increasingly being used to leverage actionable insights about students and drive student success. LA models have been found to make discriminatory decisions against certain student demographic subgroups—therefore, raising ethical concerns. Fairness in education is nascent. Only a few works have examined fairness in LA and consequently followed up with ensuring fair LA models. What this paper adds A juxtaposition of unfairness mitigation algorithms across the entire LA pipeline showing how they compare and how each of them contributes to fair LA. Ensuring ethical LA does not always lead to a dip in performance. Sometimes, it actually improves performance as well. Fairness in LA has only focused on some form of outcome equality, however equality of outcome may be possible only when the playing field is levelled. Implications for practice and/or policy Based on desired notion of fairness and which segment of the LA pipeline is accessible, a fairness‐minded decision maker may be able to decide which algorithm to use in order to achieve their ethical goals. LA practitioners can carefully aim for more ethical LA models without trading significant utility by selecting algorithms that find the right balance between the two objectives. Fairness enhancing technologies should be cautiously used as guides—not final decision makers. Human domain experts must be kept in the loop to handle the dynamics of transcending fair LA beyond equality to equitable LA.

Publication

A pseudotemporal causality approach to identifying miRNA–mRNA interactions during biological processes

Publisher: Oxford University Press (OUP)

Date: 18-10-2021

DOI: 10.1093/BIOINFORMATICS/BTAA899

Abstract: microRNAs (miRNAs) are important gene regulators and they are involved in many biological processes, including cancer progression. Therefore, correctly identifying miRNA–mRNA interactions is a crucial task. To this end, a huge number of computational methods has been developed, but they mainly use the data at one snapshot and ignore the dynamics of a biological process. The recent development of single cell data and the booming of the exploration of cell trajectories using ‘pseudotime’ concept have inspired us to develop a pseudotime-based method to infer the miRNA–mRNA relationships characterizing a biological process by taking into account the temporal aspect of the process. We have developed a novel approach, called pseudotime causality, to find the causal relationships between miRNAs and mRNAs during a biological process. We have applied the proposed method to both single cell and bulk sequencing datasets for Epithelia to Mesenchymal Transition, a key process in cancer metastasis. The evaluation results show that our method significantly outperforms existing methods in finding miRNA–mRNA interactions in both single cell and bulk data. The results suggest that utilizing the pseudotemporal information from the data helps reveal the gene regulation in a biological process much better than using the static information. R scripts and datasets can be found at github.com/AndresMCB/PTC. Supplementary data are available at Bioinformatics online.

Publication

Intervention Recommendation for Improving Disability Employment

Publisher: IEEE

Date: 10-12-2020

DOI: 10.1109/BIGDATA50022.2020.9378350

Publication

Effective Outlier Detection based on Bayesian Network and Proximity

Publisher: IEEE

Date: 12-2018

DOI: 10.1109/BIGDATA.2018.8622230

Publication

Local Search for Efficient Causal Effect Estimation

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2023

DOI: 10.1109/TKDE.2022.3218131

Publication

Discovering Ancestral Instrumental Variables for Causal Inference From Observational Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TNNLS.2023.3262848

Publication

Recommending the Most Effective Intervention to Improve Employment for Job Seekers with Disability

Publisher: ACM

Date: 14-08-2021

DOI: 10.1145/3447548.3467095

Publication

LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Publisher: Public Library of Science (PLoS)

Date: 23-04-2020

DOI: 10.1371/JOURNAL.PCBI.1007851

Publication

The KDD'23 Workshop on Causal Discovery, Prediction and Decision (CDPD 2023)

Publisher: ACM

Date: 04-08-2023

DOI: 10.1145/3580305.3599204

Publication

Identifying miRNA synergism using multiple-intervention causal inference

Publisher: Cold Spring Harbor Laboratory

Date: 28-05-2019

DOI: 10.1101/652180

Abstract: Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they in idually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. In this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.

Publication

Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives

Publisher: Springer Science and Business Media LLC

Date: 17-06-2020

DOI: 10.1038/S41467-020-16829-X

Abstract: Polygenic risk scores are emerging as a potentially powerful tool to predict future phenotypes of target in iduals, typically using unrelated in iduals, thereby devaluing information from relatives. Here, for 50 traits from the UK Biobank data, we show that a design of 5,000 in iduals with first-degree relatives of target in iduals can achieve a prediction accuracy similar to that of around 220,000 unrelated in iduals (mean prediction accuracy = 0.26 vs. 0.24, mean fold-change = 1.06 (95% CI: 0.99-1.13), P-value = 0.08), despite a 44-fold difference in s le size. For lifestyle traits, the prediction accuracy with 5,000 in iduals including first-degree relatives of target in iduals is significantly higher than that with 220,000 unrelated in iduals (mean prediction accuracy = 0.22 vs. 0.16, mean fold-change = 1.40 (1.17-1.62), P-value = 0.025). Our findings suggest that polygenic prediction integrating family information may help to accelerate precision health and clinical intervention.

Publication

Toward Unique and Unbiased Causal Effect Estimation From Data With Hidden Variables

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2023

DOI: 10.1109/TNNLS.2021.3133337

Publication

Mining combined causes in large data sets

Publisher: Elsevier BV

Date: 2016

DOI: 10.1016/J.KNOSYS.2015.10.018

Publication

Accurate data-driven prediction does not mean high reproducibility

Publisher: Springer Science and Business Media LLC

Date: 17-01-2020

DOI: 10.1038/S42256-019-0140-2

Publication

Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction

Publisher: Springer Science and Business Media LLC

Date: 15-03-2019

DOI: 10.1186/S12859-019-2668-X

Publication

NIBNA: a network-based node importance approach for identifying breast cancer drivers

Publisher: Oxford University Press (OUP)

Date: 02-03-2021

DOI: 10.1093/BIOINFORMATICS/BTAB145

Abstract: Identifying meaningful cancer driver genes in a cohort of tumors is a challenging task in cancer genomics. Although existing studies have identified known cancer drivers, most of them focus on detecting coding drivers with mutations. It is acknowledged that non-coding drivers can regulate driver mutations to promote cancer growth. In this work, we propose a novel node importance-based network analysis (NIBNA) framework to detect coding and non-coding cancer drivers. We hypothesize that cancer drivers are crucial to the formation of community structures in cancer network, and removing them from the network greatly perturbs the network structure thereby critically affecting the functioning of the network. NIBNA detects cancer drivers using a three-step process: first, a condition-specific network is built by incorporating gene expression data and gene networks second, the community structures in the network are estimated and third, a centrality-based metric is applied to compute node importance. We apply NIBNA to the BRCA dataset, and it outperforms existing state-of-art methods in detecting coding cancer drivers. NIBNA also predicts 265 miRNA drivers, and majority of these drivers have been validated in literature. Further we apply NIBNA to detect cancer subtype-specific drivers, and several predicted drivers have been validated to be associated with cancer subtypes. Lastly, we evaluate NIBNA’s performance in detecting epithelial–mesenchymal transition drivers, and we confirmed 8 coding and 13 miRNA drivers in the list of known genes. The source code can be accessed at andarsc/NIBNA. Supplementary data are available at Bioinformatics online.

Publication

From miRNA regulation to miRNA-TF co-regulation: computational approaches and challenges

Publisher: Oxford University Press (OUP)

Date: 12-07-2014

DOI: 10.1093/BIB/BBU023

Abstract: microRNAs (miRNAs) are important gene regulators. They control a wide range of biological processes and are involved in several types of cancers. Thus, exploring miRNA functions is important for diagnostics and therapeutics. To date, there are few feasible experimental techniques for discovering miRNA regulatory mechanisms. Alternatively, predictions of miRNA-mRNA regulatory relationships by computational methods have increasingly achieved promising results. Computational approaches are proving their ability as effective tools in reducing the number of biological experiments that must be conducted and to assist with the design of the experiments. In this review, we categorize and review different computational approaches to identify miRNA activities and functions, including the co-regulation of miRNAs and transcription factors. Our main focuses are on the recent approaches that use multiple data types for exploring miRNA functions. We discuss the remaining challenges in the evaluation and selection of models based on the results from a case study. Finally, we analyse the remaining challenges of each computational approach and suggest some future research directions.

Publication

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

Publisher: Springer Science and Business Media LLC

Date: 02-12-2021

DOI: 10.1186/S12859-021-04498-6

Abstract: Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to in idual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of in idual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Publication

CBNA: A control theory based method for identifying coding and non-coding cancer drivers

Publisher: Public Library of Science (PLoS)

Date: 02-12-2019

DOI: 10.1371/JOURNAL.PCBI.1007538

Publication

Dynamic cancer drivers: a causal approach for cancer driver discovery based on bio-pathological trajectories

Publisher: Oxford University Press (OUP)

Date: 19-09-2022

DOI: 10.1093/BFGP/ELAC030

Abstract: The traditional way for discovering genes which drive cancer (namely cancer drivers) neglects the dynamic information of cancer development, even though it is well known that cancer progresses dynamically. To enhance cancer driver discovery, we expand cancer driver concept to dynamic cancer driver as a gene driving one or more bio-pathological transitions during cancer progression. Our method refers to the fact that cancer should not be considered as a single process but a compendium of altered biological processes causing the disease to develop over time. Reciprocally, different drivers of cancer can potentially be discovered by analysing different bio-pathological pathways. We propose a novel approach for causal inference of genes driving one or more core processes during cancer development (i.e. dynamic cancer driver). We use the concept of pseudotime for inferring the latent progression of s les along a biological transition during cancer and identifying a critical event when such a process is significantly deviated from normal to carcinogenic. We infer driver genes by assessing the causal effect they have on the process after such a critical event. We have applied our method to single-cell and bulk sequencing datasets of breast cancer. The evaluation results show that our method outperforms well-recognized cancer driver inference methods. These results suggest that including information of the underlying dynamics of cancer improves the inference process (in comparison with using static data), and allows us to discover different sets of driver genes from different processes in cancer. R scripts and datasets can be found at github.com/AndresMCB/DynamicCancerDriver

Publication

pDriver : A novel method for unravelling personalised coding and miRNA cancer drivers.

Publisher: Oxford University Press (OUP)

Date: 27-04-2021

DOI: 10.1093/BIOINFORMATICS/BTAB262

Abstract: Unravelling cancer driver genes is important in cancer research. Although computational methods have been developed to identify cancer drivers, most of them detect cancer drivers at population level. However, two patients who have the same cancer type and receive the same treatment may have different outcomes because each patient has a different genome and their disease might be driven by different driver genes. Therefore new methods are being developed for discovering cancer drivers at in idual level, but existing personalized methods only focus on coding drivers while microRNAs (miRNAs) have been shown to drive cancer progression as well. Thus, novel methods are required to discover both coding and miRNA cancer drivers at in idual level. We propose the novel method, pDriver, to discover personalized cancer drivers. pDriver includes two stages: (i) constructing gene networks for each cancer patient and (ii) discovering cancer drivers for each patient based on the constructed gene networks. To demonstrate the effectiveness of pDriver, we have applied it to five TCGA cancer datasets and compared it with the state-of-the-art methods. The result indicates that pDriver is more effective than other methods. Furthermore, pDriver can also detect miRNA cancer drivers and most of them have been confirmed to be associated with cancer by literature. We further analyze the predicted personalized drivers for breast cancer patients and the result shows that they are significantly enriched in many GO processes and KEGG pathways involved in breast cancer. pDriver is available at vvhoang Driver. Supplementary data are available at Bioinformatics online.

Publication

Computational methods for identifying miRNA sponge interactions

Publisher: Oxford University Press (OUP)

Date: 05-06-2016

DOI: 10.1093/BIB/BBW042

Abstract: Recent findings show that coding genes are not the only targets that miRNAs interact with. In fact, there is a pool of different RNAs competing with each other to attract miRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The ceRNAs indirectly regulate each other via the titration mechanism, i.e. the increasing concentration of a ceRNA will decrease the number of miRNAs that are available for interacting with other targets. The cross-talks between ceRNAs, i.e. their interactions mediated by miRNAs, have been identified as the drivers in many disease conditions, including cancers. In recent years, some computational methods have emerged for identifying ceRNA-ceRNA interactions. However, there remain great challenges and opportunities for developing computational methods to provide new insights into ceRNA regulatory mechanisms.In this paper, we review the publically available databases of ceRNA-ceRNA interactions and the computational methods for identifying ceRNA-ceRNA interactions (also known as miRNA sponge interactions). We also conduct a comparison study of the methods with a breast cancer dataset. Our aim is to provide a current snapshot of the advances of the computational methods in identifying miRNA sponge interactions and to discuss the remaining challenges.

Publication

Stable breast cancer prognosis

Publisher: Cold Spring Harbor Laboratory

Date: 15-09-2021

DOI: 10.1101/2021.09.13.460002

Abstract: Predicting breast cancer prognosis helps improve the treatment and management of the disease. In the last decades, many prediction models have been developed for breast cancer prognosis based on transcriptomic data. A common assumption made by these models is that the test and training data follow the same distribution. However, in practice, due to the heterogeneity of breast cancer and the different environments (e.g. hospitals) where data are collected, the distribution of the test data may shift from that of the training data. For ex le, new patients likely have different breast cancer stage distribution from those in the training dataset. Thus these existing methods may not provide stable prediction performance for breast cancer prognosis in situations with the shift of data distribution. In this paper, we present a novel stable prediction method for reliable breast cancer prognosis under data distribution shift. Our model, known as Deep Global Balancing Cox regression (DGBCox), is based on the causal inference theory. In DGBCox, firstly high-dimensional gene expression data is transferred to latent network-based representations by a deep auto-encoder neural network. Then after balancing the latent representations using a proposed causality-based approach, causal latent features are selected for breast cancer prognosis. Causal features have persistent relationships with survival outcomes even under distribution shift across different environments according to the causal inference theory. Therefore, the proposed DGBCox method is robust and stable for breast cancer prognosis. We apply DGBCox to 12 test datasets from different breast cancer studies. The results show that DGBCox outperforms benchmark methods in terms of both prediction accuracy and stability. We also propose a permutation importance algorithm to rank the genes in the DGBCox model. The top 50 ranked genes suggest that the cell cycle and the organelle organisation could be the most relevant biological processes for stable breast cancer prognosis. Various prediction models have been proposed for breast cancer prognosis. The prediction models usually train on a dataset and predict the survival outcomes of patients in new test datasets. The majority of these models share a common assumption that the test and training data follow the same distribution. However, as breast cancer is a heterogeneous disease, the assumption may be violated in practice. In this study, we propose a novel method for reliable breast cancer prognosis when the test data distribution shifts from that of the training data. The proposed model has been trained on one dataset and applied to twelve test datasets from different breast cancer studies. In comparison with the benchmark methods in breast cancer prognosis, our model shows better prediction accuracy and stability. The top 50 important genes in our model provide clues to the relationship between several biological mechanisms and clinical outcomes of breast cancer. Our proposed method in breast cancer can potentially be adapted to apply to other cancer types.

Publication

miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

Publisher: Informa UK Limited

Date: 06-04-2021

DOI: 10.1080/15476286.2021.1905341

Publication

A general framework for causal classification

Publisher: Springer Science and Business Media LLC

Date: 03-2021

DOI: 10.1007/S41060-021-00249-1

Publication

GraphDTA: predicting drug–target binding affinity with graph neural networks

Publisher: Oxford University Press (OUP)

Date: 24-10-2021

DOI: 10.1093/BIOINFORMATICS/BTAA921

Abstract: The development of new drugs is costly, time consuming and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug–target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity. We show that graph neural networks not only predict drug–target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug–target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. The proposed models are implemented in Python. Related data, pre-trained models and source code are publicly available at hinng/GraphDTA. All scripts and data needed to reproduce the post hoc statistical analysis are available from 0.5281/zenodo.3603523. Supplementary data are available at Bioinformatics online.

Publication

Multi-Source Causal Feature Selection

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2020

DOI: 10.1109/TPAMI.2019.2908373

Publication

Assessment of network module identification across complex diseases

Publisher: Springer Science and Business Media LLC

Date: 30-08-2019

DOI: 10.1038/S41592-019-0509-5

Publication

Identifying miRNA synergistic regulatory networks in heterogeneous human data via network motifs

Publisher: Royal Society of Chemistry (RSC)

Date: 2016

DOI: 10.1039/C5MB00562K

Abstract: We present a causality based framework called mirSRN to infer miRNA synergism in human molecular systems.

Publication

The KDD 2022 Workshop on Causal Discovery (CD2022)

Publisher: ACM

Date: 14-08-2022

DOI: 10.1145/3534678.3542890

Publication

miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules

Publisher: Springer Science and Business Media LLC

Date: 10-05-2019

DOI: 10.1186/S12859-019-2861-Y

Publication

Sufficient dimension reduction for average causal effect estimation

Publisher: Springer Science and Business Media LLC

Date: 20-04-2022

DOI: 10.1007/S10618-022-00832-5

Abstract: A large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the number of s les. Propensity score is a common way to deal with a large covariate set, but the accuracy of propensity score estimation (normally done by logistic regression) is also challenged by the large number of covariates. In this paper, we prove that a large covariate set can be reduced to a lower dimensional representation which captures the complete information for adjustment in causal effect estimation. The theoretical result enables effective data-driven algorithms for causal effect estimation. Supported by the result, we develop an algorithm that employs a supervised kernel dimension reduction method to learn a lower dimensional representation from the original covariate space, and then utilises nearest neighbour matching in the reduced covariate space to impute the counterfactual outcomes to avoid the large sized covariate set problem. The proposed algorithm is evaluated on two semisynthetic and three real-world datasets and the results show the effectiveness of the proposed algorithm.

Publication

Multi-Group Transfer Learning on Multiple Latent Spaces for Text Classification

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2020

DOI: 10.1109/ACCESS.2020.2984571

Publication

Nonparametric Sparse Matrix Decomposition for Cross-View Dimensionality Reduction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2017

DOI: 10.1109/TMM.2017.2683258

Publication

Time to infer miRNA sponge modules

Publisher: Wiley

Date: 03-08-2021

DOI: 10.1002/WRNA.1686

Abstract: Inferring competing endogenous RNA (ceRNA) or microRNA (miRNA) sponge modules is a challenging and meaningful task for revealing ceRNA regulation mechanism at the module level. Modules in this context refer to groups of miRNA sponges which have mutual competitions and act as functional units for achieving biological processes. The recent development of computational methods based on heterogeneous data provides a novel way to discern the competitive effects of miRNA sponges on human complex diseases. This article aims to provide a comprehensive perspective of miRNA sponge module discovery methods. We first review the publicly available databases of cancer‐related miRNA sponges, as the miRNA sponges involved in human cancers contribute to the discovery of cancer‐associated modules. Then we review the existing computational methods for inferring miRNA sponge modules. Furthermore, we conduct an assessment on the performance of the module discovery methods with the pan‐cancer dataset, and the comparison study indicates that it is useful to infer biologically meaningful miRNA sponge modules by directly mapping heterogeneous data to the competitive modules. Finally, we discuss the future directions and associated challenges in developing in silico methods to infer miRNA sponge modules. This article is categorized under: RNA Interactions with Proteins and Other Molecules Small Molecule‐RNA Interactions Regulatory RNAs/RNAi/Riboswitches Regulatory RNAs

Publication

The KDD 2021 Workshop on Causal Discovery (CD2021)

Publisher: ACM

Date: 14-08-2021

DOI: 10.1145/3447548.3469462

Publication

A novel single-cell based method for breast cancer prognosis

Publisher: Public Library of Science (PLoS)

Date: 24-08-2020

DOI: 10.1371/JOURNAL.PCBI.1008133

Publication

A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2019

DOI: 10.1109/TCBB.2016.2591526

Publication

Estimating heterogeneous treatment effects by balancing heterogeneity and fitness

Publisher: Cold Spring Harbor Laboratory

Date: 29-05-2018

DOI: 10.1101/333278

Abstract: Estimating heterogeneous treatment effects is an important problem in many medical and biological applications since treatments may have different effects on the prognoses of different patients. Recently, several recursive partitioning methods have been proposed to identify the subgroups that with different responds to a treatment, and they rely on a fitness criterion to minimize the error between the estimated treatment effects and the unobservable true effects. In this paper, we propose that a heterogeneity criterion, which maximizes the differences of treatment effects among the subgroups, also needs to be considered. Moreover, we show that better performances can be achieved when the fitness and the heterogeneous criteria are considered simultaneously. Selecting the optimal splitting points then becomes a multi-objective problem however, a solution that achieves optimal in both aspects are often not available. To solve this problem, we propose a multi-objective splitting procedure to balance both criteria. The proposed procedure is computationally efficient and fits naturally into the existing recursive partitioning framework. Experimental results show that the proposed multi-objective approach performs consistently better than existing ones. The effects of a treatment are often not the same for different in iduals with different gene expressions. Learning to predict the heterogeneous treatment effects from clinical and expression data is an important step towards personalized medical treatment. Existing computational methods are not ideal for the task because they do not address the interpretability of the model and do not consider the limited s le sizes in biological and medical applications. Our method addresses these issues and achieves superior performance in analyzing the treatment effects of radiotherapy on breast cancer patients.

Publication

Computational Methods for Predicting Autism Spectrum Disorder from Gene Expression Data

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-65390-3_31

Publication

Inferring microRNA–mRNA causal regulatory relationships from expression data

Publisher: Oxford University Press (OUP)

Date: 30-01-2013

DOI: 10.1093/BIOINFORMATICS/BTT048

Abstract: Motivation: microRNAs (miRNAs) are known to play an essential role in the post-transcriptional gene regulation in plants and animals. Currently, several computational approaches have been developed with a shared aim to elucidate miRNA–mRNA regulatory relationships. Although these existing computational methods discover the statistical relationships, such as correlations and associations between miRNAs and mRNAs at data level, such statistical relationships are not necessarily the real causal regulatory relationships that would ultimately provide useful insights into the causes of gene regulations. The standard method for determining causal relationships is randomized controlled perturbation experiments. In practice, however, such experiments are expensive and time consuming. Our motivation for this study is to discover the miRNA–mRNA causal regulatory relationships from observational data. Results: We present a causality discovery-based method to uncover the causal regulatory relationship between miRNAs and mRNAs, using expression profiles of miRNAs and mRNAs without taking into consideration the previous target information. We apply this method to the epithelial-to-mesenchymal transition (EMT) datasets and validate the computational discoveries by a controlled biological experiment for the miR-200 family. A significant portion of the regulatory relationships discovered in data is consistent with those identified by experiments. In addition, the top genes that are causally regulated by miRNAs are highly relevant to the biological conditions of the datasets. The results indicate that the causal discovery method effectively discovers miRNA regulatory relationships in data. Although computational predictions may not completely replace intervention experiments, the accurate and reliable discoveries in data are cost effective for the design of miRNA experiments and the understanding of miRNA–mRNA regulatory relationships. Availability: The R scripts are in the Supplementary material. Contact: thuc_duy.le@mymail.unisa.edu.au or jiuyong.li@unisa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase

Publisher: Springer Science and Business Media LLC

Date: 12-2018

DOI: 10.1186/S12859-018-2531-5

Publication

Extensive transcriptional responses are co-ordinated by microRNAs as revealed by Exon-Intron Split Analysis (EISA)

Publisher: Oxford University Press (OUP)

Date: 02-08-2019

DOI: 10.1093/NAR/GKZ664

Abstract: Epithelial–mesenchymal transition (EMT) has been a subject of intense scrutiny as it facilitates metastasis and alters drug sensitivity. Although EMT-regulatory roles for numerous miRNAs and transcription factors are known, their functions can be difficult to disentangle, in part due to the difficulty in identifying direct miRNA targets from complex datasets and in deciding how to incorporate ‘indirect’ miRNA effects that may, or may not, represent biologically relevant information. To better understand how miRNAs exert effects throughout the transcriptome during EMT, we employed Exon–Intron Split Analysis (EISA), a bioinformatic technique that separates transcriptional and post-transcriptional effects through the separate analysis of RNA-Seq reads mapping to exons and introns. We find that in response to the manipulation of miRNAs, a major effect on gene expression is transcriptional. We also find extensive co-ordination of transcriptional and post-transcriptional regulatory mechanisms during both EMT and mesenchymal to epithelial transition (MET) in response to TGF-β or miR-200c respectively. The prominent transcriptional influence of miRNAs was also observed in other datasets where miRNA levels were perturbed. This work cautions against a narrow approach that is limited to the analysis of direct targets, and demonstrates the utility of EISA to examine complex regulatory networks involving both transcriptional and post-transcriptional mechanisms.

Publication

Computational methods for cancer driver discovery: A survey

Publisher: Ivyspring International Publisher

Date: 2021

DOI: 10.7150/THNO.52670

Publication

Stabilising Job Survival Analysis for Disability Employment Services in Unseen Environments

Publisher: ACM

Date: 04-08-2023

DOI: 10.1145/3580305.3599908

Publication

DriverGroup: A novel method for identifying driver gene groups

Publisher: Oxford University Press (OUP)

Date: 12-2020

DOI: 10.1093/BIOINFORMATICS/BTAA797

Abstract: Identifying cancer driver genes is a key task in cancer informatics. Most existing methods are focused on in idual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesize that there are driver gene groups that work in concert to regulate cancer, and we develop a novel computational method to detect those driver gene groups. We develop a novel method named DriverGroup to detect driver gene groups by using gene expression and gene interaction data. The proposed method has three stages: (i) constructing the gene network, (ii) discovering critical nodes of the constructed network and (iii) identifying driver gene groups based on the discovered critical nodes. Before evaluating the performance of DriverGroup in detecting cancer driver groups, we firstly assess its performance in detecting the influence of gene groups, a key step of DriverGroup. The application of DriverGroup to DREAM4 data demonstrates that it is more effective than other methods in detecting the regulation of gene groups. We then apply DriverGroup to the BRCA dataset to identify driver groups for breast cancer. The identified driver groups are promising as several group members are confirmed to be related to cancer in literature. We further use the predicted driver groups in survival analysis and the results show that the survival curves of patient subpopulations classified using the predicted driver groups are significantly differentiated, indicating the usefulness of DriverGroup. DriverGroup is available at vvhoang/DriverGroup Supplementary data are available at Bioinformatics online.

Publication

Predicting miRNA Targets by Integrating Gene Regulatory Knowledge with Expression Profiles

Publisher: Public Library of Science (PLoS)

Date: 11-04-2016

DOI: 10.1371/JOURNAL.PONE.0152860

Publication

Causal heterogeneity discovery by bottom-up pattern search for personalised decision making

Publisher: Springer Science and Business Media LLC

Date: 02-08-2023

DOI: 10.1007/S10489-022-03860-2

Abstract: In personalised decision making, evidence is required to determine whether an action (treatment) is suitable for an in idual. Such evidence can be obtained by modelling treatment effect heterogeneity in subgroups. The existing interpretable modelling methods take a top-down approach to search for subgroups with heterogeneous treatment effects and they may miss the most specific and relevant context for an in idual. In this paper, we design a Treatment effect pattern (TEP) to represent treatment effect heterogeneity in data. To achieve an interpretable presentation of TEPs, we use a local causal structure around the outcome to explicitly show how those important variables are used in modelling. We also derive a formula for unbiasedly estimating the Conditional Average Causal Effect (CATE) using the local structure in our problem setting. In the discovery process, we aim at minimising heterogeneity within each subgroup represented by a pattern. We propose a bottom-up search algorithm to discover the most specific patterns fitting in idual circumstances the best for personalised decision making. Experiments show that the proposed method models treatment effect heterogeneity better than three other existing tree based methods in synthetic and real world data sets.

Publication

Divide and Conquer: Targeted Adversary Detection using Proximity and Dependency

Publisher: IEEE

Date: 12-2021

DOI: 10.1109/ICKG52313.2021.00026

Publication

Ensemble Methods for MiRNA Target Prediction from Expression Data

Publisher: Public Library of Science (PLoS)

Date: 26-06-2015

DOI: 10.1371/JOURNAL.PONE.0131627

Publication

miRspongeR 2.0: an enhanced R package for exploring miRNA sponge regulation

Publisher: Oxford University Press (OUP)

Date: 2022

DOI: 10.1093/BIOADV/VBAC063

Abstract: MicroRNA (miRNA) sponges influence the capability of miRNA-mediated gene silencing by competing for shared miRNA response elements and play significant roles in many physiological and pathological processes. It has been proved that computational or dry-lab approaches are useful to guide wet-lab experiments for uncovering miRNA sponge regulation. However, all of the existing tools only allow the analysis of miRNA sponge regulation regarding a group of s les, rather than the miRNA sponge regulation unique to in idual s les. Furthermore, most existing tools do not allow parallel computing for the fast identification of miRNA sponge regulation. Here, we present an enhanced version of our R/Bioconductor package, miRspongeR 2.0. Compared with the original version introduced in 2019, this package extends the resolution of miRNA sponge regulation from the multi-s le level to the single-s le level. Moreover, it supports the identification of miRNA sponge networks using parallel computing, and the construction of s le–s le correlation networks. It also provides more computational methods to infer miRNA sponge regulation and expands the ground truth for validation. With these new features, we anticipate that miRspongeR 2.0 will further accelerate the research on miRNA sponges with higher resolution and more utilities. ackages/miRspongeR/. Supplementary data are available at Bioinformatics Advances online.

Publication

From Observational Studies to Causal Rule Mining

Publisher: Association for Computing Machinery (ACM)

Date: 24-11-2015

DOI: 10.1145/2746410

Abstract: Randomised controlled trials (RCTs) are the most effective approach to causal discovery, but in many circumstances it is impossible to conduct RCTs. Therefore, observational studies based on passively observed data are widely accepted as an alternative to RCTs. However, in observational studies, prior knowledge is required to generate the hypotheses about the cause-effect relationships to be tested, and hence they can only be applied to problems with available domain knowledge and a handful of variables. In practice, many datasets are of high dimensionality, which leaves observational studies out of the opportunities for causal discovery from such a wealth of data sources. In another direction, many efficient data mining methods have been developed to identify associations among variables in large datasets. The problem is that causal relationships imply associations, but the reverse is not always true. However, we can see the synergy between the two paradigms here. Specifically, association rule mining can be used to deal with the high-dimensionality problem, whereas observational studies can be utilised to eliminate noncausal associations. In this article, we propose the concept of causal rules (CRs) and develop an algorithm for mining CRs in large datasets. We use the idea of retrospective cohort studies to detect CRs based on the results of association rule mining. Experiments with both synthetic and real-world datasets have demonstrated the effectiveness and efficiency of CR mining. In comparison with the commonly used causal discovery methods, the proposed approach generally is faster and has better or competitive performance in finding correct or sensible causes. It is also capable of finding a cause consisting of multiple variables—a feature that other causal discovery methods do not possess.

Publication

Inferring microRNA and transcription factor regulatory networks in heterogeneous data

Publisher: Springer Science and Business Media LLC

Date: 11-03-2013

DOI: 10.1186/1471-2105-14-92

Publication

Building fair predictive models

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-64984-5_17

Publication

Inferring and analyzing module-specific lncRNA–mRNA causal regulatory networks in human cancer

Publisher: Oxford University Press (OUP)

Date: 02-2018

DOI: 10.1093/BIB/BBY008

Abstract: It is known that noncoding RNAs (ncRNAs) cover ∼98% of the transcriptome, but do not encode proteins. Among ncRNAs, long noncoding RNAs (lncRNAs) are a large and erse class of RNA molecules, and are thought to be a gold mine of potential oncogenes, anti-oncogenes and new biomarkers. Although only a minority of lncRNAs is functionally characterized, it is clear that they are important regulators to modulate gene expression and involve in many biological functions. To reveal the functions and regulatory mechanisms of lncRNAs, it is vital to understand how lncRNAs regulate their target genes for implementing specific biological functions. In this article, we review the computational methods for inferring lncRNA–mRNA interactions and the third-party databases of storing lncRNA–mRNA regulatory relationships. We have found that the existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs, and may not reveal gene regulatory relationships which are causal relationships. Moreover, these methods do not consider the modularity of lncRNA–mRNA regulatory networks, and thus, the networks identified are not module-specific. To address the above two issues, we propose a novel method, MSLCRN, to infer and analyze module-specific lncRNA–mRNA causal regulatory networks. We have applied it into glioblastoma multiforme, lung squamous cell carcinoma, ovarian cancer and prostate cancer, respectively. The experimental results show that MSLCRN, as an expression-based method, could be a useful complementary method to study lncRNA regulations.

Publication

miRsponge: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules

Publisher: Cold Spring Harbor Laboratory

Date: 28-12-2018

DOI: 10.1101/507749

Abstract: A microRNA (miRNA) sponge is an RNA molecule with multiple tandem miRNA response elements that can sequester miRNAs from their target mRNAs. Despite growing appreciation of the importance of miRNA sponges, our knowledge of their complex functions remains limited. Moreover, there is still a lack of miRNA sponge research tools that help researchers to quickly compare their proposed methods with other methods, apply existing methods to new datasets, or select appropriate methods for assisting in subsequent experimental design. To fill the gap, we present an R/Bioconductor package, miRsponge , for simplifying the procedure of identifying and analyzing miRNA sponge interaction networks and modules. It provides seven popular methods and an integrative method to identify miRNA sponge interactions. Moreover, it supports the validation of miRNA sponge interactions and the identification of miRNA sponge modules, as well as functional enrichment and survival analysis of miRNA sponge modules. This package enables researchers to quickly evaluate their new methods, apply existing methods to new datasets, and consequently speed up miRNA sponge research.

Publication

Estimating heterogeneous treatment effect by balancing heterogeneity and fitness

Publisher: Springer Science and Business Media LLC

Date: 12-2018

DOI: 10.1186/S12859-018-2521-7

Publication

ParallelPC: An R Package for Efficient Causal Exploration in Genomic Data

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-04503-6_22

Publication

MrPC: causal structure learning in distributed systems

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-63820-7_10

Publication

Discovery of Causal Rules Using Partial Association

Publisher: IEEE

Date: 12-2012

DOI: 10.1109/ICDM.2012.36

Publication

LoPAD: A Local Prediction Approach to Anomaly Detection

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-47436-2_50

Publication

LncmiRSRN: identification and analysis of long non-coding RNA related miRNA sponge regulatory network in human cancer

Publisher: Oxford University Press (OUP)

Date: 28-06-2018

DOI: 10.1093/BIOINFORMATICS/BTY525

Abstract: MicroRNAs (miRNAs) are small non-coding RNAs with the length of ∼22 nucleotides. miRNAs are involved in many biological processes including cancers. Recent studies show that long non-coding RNAs (lncRNAs) are emerging as miRNA sponges, playing important roles in cancer physiology and development. Despite accumulating appreciation of the importance of lncRNAs, the study of their complex functions is still in its preliminary stage. Based on the hypothesis of competing endogenous RNAs (ceRNAs), several computational methods have been proposed for investigating the competitive relationships between lncRNAs and miRNA target messenger RNAs (mRNAs). However, when the mRNAs are released from the control of miRNAs, it remains largely unknown as to how the sponge lncRNAs influence the expression levels of the endogenous miRNA targets. We propose a novel method to construct lncRNA related miRNA sponge regulatory networks (LncmiRSRNs) by integrating matched lncRNA and mRNA expression profiles with clinical information and putative miRNA-target interactions. Using the method, we have constructed the LncmiRSRNs for four human cancers (glioblastoma multiforme, lung cancer, ovarian cancer and prostate cancer). Based on the networks, we discover that after being released from miRNA control, the target mRNAs are normally up-regulated by the sponge lncRNAs, and only a fraction of sponge lncRNA-mRNA regulatory relationships and hub lncRNAs are shared by the four cancers. Moreover, most sponge lncRNA-mRNA regulatory relationships show a rewired mode between different cancers, and a minority of sponge lncRNA-mRNA regulatory relationships conserved (appearing) in different cancers may act as a common pivot across cancers. Besides, differential and conserved hub lncRNAs may act as potential cancer drivers to influence the cancerous state in cancers. Functional enrichment and survival analysis indicate that the identified differential and conserved LncmiRSRN network modules work as functional units in biological processes, and can distinguish metastasis risks of cancers. Our analysis demonstrates the potential of integrating expression profiles, clinical information and miRNA-target interactions for investigating lncRNA regulatory mechanism. LncmiRSRN is freely available (hangjunpeng411/LncmiRSRN). Supplementary data are available at Bioinformatics online.

Publication

Inferring condition-specific miRNA activity from matched miRNA and mRNA expression data

Publisher: Oxford University Press (OUP)

Date: 23-07-2014

DOI: 10.1093/BIOINFORMATICS/BTU489

Abstract: Motivation: MicroRNAs (miRNAs) play crucial roles in complex cellular networks by binding to the messenger RNAs (mRNAs) of protein coding genes. It has been found that miRNA regulation is often condition-specific. A number of computational approaches have been developed to identify miRNA activity specific to a condition of interest using gene expression data. However, most of the methods only use the data in a single condition, and thus, the activity discovered may not be unique to the condition of interest. Additionally, these methods are based on statistical associations between the gene expression levels of miRNAs and mRNAs, so they may not be able to reveal real gene regulatory relationships, which are causal relationships. Results: We propose a novel method to infer condition-specific miRNA activity by considering (i) the difference between the regulatory behavior that an miRNA has in the condition of interest and its behavior in the other conditions (ii) the causal semantics of miRNA–mRNA relationships. The method is applied to the epithelial–mesenchymal transition (EMT) and multi-class cancer (MCC) datasets. The validation by the results of transfection experiments shows that our approach is effective in discovering significant miRNA–mRNA interactions. Functional and pathway analysis and literature validation indicate that the identified active miRNAs are closely associated with the specific biological processes, diseases and pathways. More detailed analysis of the activity of the active miRNAs implies that some active miRNAs show different regulation types in different conditions, but some have the same regulation types and their activity only differs in different conditions in the strengths of regulation. Availability and implementation: The R and Matlab scripts are in the Supplementary materials . Contact: jiuyong.li@unisa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Thuc Duy Le

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Decision Support for Disability Employment using Counterfactual Survival Analysis

Data-driven discovery of causal interactions

Identifying Cancer Subtypes from miRNA-TF-mRNA Regulatory Networks and Expression Data

Identifying preeclampsia-associated genes using a control theory method

What is the Most Effective Intervention to Increase Job Retention for this Disabled Worker?

Identifying direct miRNA–mRNA causal regulatory relationships in heterogeneous data

Identifying miRNA synergism using multiple-intervention causal inference

Ancestral Instrument Method for Causal Inference without Complete Knowledge

Recommending Personalized Interventions to Increase Employability of Disabled Jobseekers

Mining Causal Association Rules

miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships

Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample

PAN: Personalized Annotation-Based Networks for the Prediction of Breast Cancer Relapse

Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey

Personalized Interventions to Increase the Employment Success of People With Disability

The winning methods for predicting cellular position in the DREAM single-cell transcriptomics challenge

Uncovering the roles of microRNAs/lncRNAs in characterising breast cancer subtypes and prognosis

Data Mining

A novel framework for inferring condition-specific TF and miRNA co-regulation of protein–protein interactions

How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?

A pseudotemporal causality approach to identifying miRNA–mRNA interactions during biological processes

Intervention Recommendation for Improving Disability Employment

Effective Outlier Detection based on Bayesian Network and Proximity

Local Search for Efficient Causal Effect Estimation

Discovering Ancestral Instrumental Variables for Causal Inference From Observational Data

Recommending the Most Effective Intervention to Improve Employment for Job Seekers with Disability

LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

The KDD'23 Workshop on Causal Discovery, Prediction and Decision (CDPD 2023)

Identifying miRNA synergism using multiple-intervention causal inference

Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives

Toward Unique and Unbiased Causal Effect Estimation From Data With Hidden Variables

Mining combined causes in large data sets

Accurate data-driven prediction does not mean high reproducibility

Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction

NIBNA: a network-based node importance approach for identifying breast cancer drivers

From miRNA regulation to miRNA-TF co-regulation: computational approaches and challenges

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

CBNA: A control theory based method for identifying coding and non-coding cancer drivers

Dynamic cancer drivers: a causal approach for cancer driver discovery based on bio-pathological trajectories

pDriver : A novel method for unravelling personalised coding and miRNA cancer drivers.

Computational methods for identifying miRNA sponge interactions

Stable breast cancer prognosis

miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

A general framework for causal classification

GraphDTA: predicting drug–target binding affinity with graph neural networks

Multi-Source Causal Feature Selection

Assessment of network module identification across complex diseases

Identifying miRNA synergistic regulatory networks in heterogeneous human data via network motifs

The KDD 2022 Workshop on Causal Discovery (CD2022)

miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules

Sufficient dimension reduction for average causal effect estimation

Multi-Group Transfer Learning on Multiple Latent Spaces for Text Classification

Nonparametric Sparse Matrix Decomposition for Cross-View Dimensionality Reduction

Time to infer miRNA sponge modules

The KDD 2021 Workshop on Causal Discovery (CD2021)

A novel single-cell based method for breast cancer prognosis

A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs

Estimating heterogeneous treatment effects by balancing heterogeneity and fitness

Computational Methods for Predicting Autism Spectrum Disorder from Gene Expression Data

Inferring microRNA–mRNA causal regulatory relationships from expression data

miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase

Extensive transcriptional responses are co-ordinated by microRNAs as revealed by Exon-Intron Split Analysis (EISA)

Computational methods for cancer driver discovery: A survey

Stabilising Job Survival Analysis for Disability Employment Services in Unseen Environments

DriverGroup: A novel method for identifying driver gene groups

Predicting miRNA Targets by Integrating Gene Regulatory Knowledge with Expression Profiles

Causal heterogeneity discovery by bottom-up pattern search for personalised decision making

Divide and Conquer: Targeted Adversary Detection using Proximity and Dependency

Ensemble Methods for MiRNA Target Prediction from Expression Data

miRspongeR 2.0: an enhanced R package for exploring miRNA sponge regulation

From Observational Studies to Causal Rule Mining