ARDC Research Link Australia

Publication

Privacy preserving serial publication of transactional data

Publisher: Elsevier BV

Date: 05-2019

DOI: 10.1016/J.IS.2019.01.001

Publication

A Study of the Single Point Mutation Loci in the Hepatitis B Virus Sequences via Optimal Risk and Preventive Sets with Weights

Publisher: Springer Berlin Heidelberg

Date: 2012

DOI: 10.1007/978-3-642-29253-8_39

Publication

What is the Most Effective Intervention to Increase Job Retention for this Disabled Worker?

Publisher: ACM

Date: 14-08-2022

DOI: 10.1145/3534678.3539026

Publication

Identifying miRNA synergism using multiple-intervention causal inference

Publisher: Springer Science and Business Media LLC

Date: 12-2019

DOI: 10.1186/S12859-019-3215-5

Abstract: Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they in idually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. n this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.

Publication

Ancestral Instrument Method for Causal Inference without Complete Knowledge

Publisher: International Joint Conferences on Artificial Intelligence Organization

Date: 07-2022

DOI: 10.24963/IJCAI.2022/671

Abstract: Unobserved confounding is the main obstacle to causal effect estimation from observational data. Instrumental variables (IVs) are widely used for causal effect estimation when there exist latent confounders. With the standard IV method, when a given IV is valid, unbiased estimation can be obtained, but the validity requirement on a standard IV is strict and untestable. Conditional IVs have been proposed to relax the requirement of standard IVs by conditioning on a set of observed variables (known as a conditioning set for a conditional IV). However, the criterion for finding a conditioning set for a conditional IV needs a directed acyclic graph (DAG) representing the causal relationships of both observed and unobserved variables. This makes it challenging to discover a conditioning set directly from data. In this paper, by leveraging maximal ancestral graphs (MAGs) for causal inference with latent variables, we study the graphical properties of ancestral IVs, a type of conditional IVs using MAGs, and develop the theory to support data-driven discovery of the conditioning set for a given ancestral IV in data under the pretreatment variable assumption. Based on the theory, we develop an algorithm for unbiased causal effect estimation with a given ancestral IV and observational data. Extensive experiments on synthetic and real-world datasets demonstrate the performance of the algorithm in comparison with existing IV methods.

Publication

Efficient Outlier Detection for High-Dimensional Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2018

DOI: 10.1109/TSMC.2017.2718220

Publication

miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships

Publisher: Public Library of Science (PLoS)

Date: 30-12-2015

DOI: 10.1371/JOURNAL.PONE.0145386

Publication

Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data

Publisher: Springer Science and Business Media LLC

Date: 10-10-2017

DOI: 10.1007/S13755-017-0025-X

Publication

Spin-down evolution and radio disappearance of the magnetar PSR J1622-4950

Publisher: American Astronomical Society

Date: 05-06-2017

DOI: 10.3847/1538-4357/AA73DE

Publication

Guest Editorial: Special Issue on Causal Discovery 2017

Publisher: Springer Science and Business Media LLC

Date: 08-2018

DOI: 10.1007/S41060-018-0101-6

Publication

Authenticity and credibility aware detection of adverse drug events from social media

Publisher: Elsevier BV

Date: 12-2018

DOI: 10.1016/J.IJMEDINF.2018.09.002

Abstract: Adverse drug events (ADEs) are among the top causes of hospitalization and death. Social media is a promising open data source for the timely detection of potential ADEs. In this paper, we study the problem of detecting signals of ADEs from social media. Detecting ADEs whose drug and AE may be reported in different posts of a user leads to major concerns regarding the content authenticity and user credibility, which have not been addressed in previous studies. Content authenticity concerns whether a post mentions drugs or adverse events that are actually consumed or experienced by the writer. User credibility indicates the degree to which chronological evidence from a user's sequence of posts should be trusted in the ADE detection. We propose AC-SPASM, a Bayesian model for the authenticity and credibility aware detection of ADEs from social media. The model exploits the interaction between content authenticity, user credibility and ADE signal quality. In particular, we argue that the credibility of a user correlates with the user's consistency in reporting authentic content. We conduct experiments on a real-world Twitter dataset containing 1.2 million posts from 13,178 users. Our benchmark set contains 22 drugs and 8089 AEs. AC-SPASM recognizes authentic posts with F Our study demonstrates that taking into account the content authenticity and user credibility improves the detection of ADEs from social media. Our work generates hypotheses to reduce experts' guesswork in identifying unknown potential ADEs.

Publication

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample

Publisher: Hogrefe Publishing Group

Date: 04-2016

DOI: 10.1027/1015-5759/A000242

Abstract: Abstract. Studies on the construct validity of the Self-Description Questionnaire II (SDQII) have not compared the factor structure between the English and Chinese versions of the SDQII. By using rigorous multiple group comparison procedures based upon confirmatory factor analysis (CFA) of measurement invariance, the present study examined the responses of Australian high school students (N = 302) and Chinese high school students (N = 322) using the English and Chinese versions of the SDQII, respectively. CFA provided strong evidence that the factor structure (factor loading and item intercept) of the Chinese version of the SDQII in comparison to responses to the English version of the SDQII is invariant, therefore it allows researchers to confidently utilize both the English and Chinese versions of the SDQII with Chinese and Australian s les separately and cross-culturally.

Publication

Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey

Publisher: arXiv

Date: 2022

DOI: 10.48550/ARXIV.2208.09590

Publication

Fairmod: making predictions fair in multiple protected attributes

Publisher: Springer Science and Business Media LLC

Date: 30-10-2023

DOI: 10.1007/S10115-023-02003-4

Publication

Medical Applications of Artificial Intelligence

Publisher: CRC Press

Date: 06-11-2013

DOI: 10.1201/B15618

Publication

Personalized Interventions to Increase the Employment Success of People With Disability

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TBDATA.2023.3291547

Publication

Uncovering the roles of microRNAs/lncRNAs in characterising breast cancer subtypes and prognosis

Publisher: Springer Science and Business Media LLC

Date: 04-06-2021

DOI: 10.1186/S12859-021-04215-3

Abstract: Accurate prognosis and identification of cancer subtypes at molecular level are important steps towards effective and personalised treatments of breast cancer. To this end, many computational methods have been developed to use gene (mRNA) expression data for breast cancer subtyping and prognosis. Meanwhile, microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been extensively studied in the last 2 decades and their associations with breast cancer subtypes and prognosis have been evidenced. However, it is not clear whether using miRNA and/or lncRNA expression data helps improve the performance of gene expression based subtyping and prognosis methods, and this raises challenges as to how and when to use these data and methods in practice. In this paper, we conduct a comparative study of 35 methods, including 12 breast cancer subtyping methods and 23 breast cancer prognosis methods, on a collection of 19 independent breast cancer datasets. We aim to uncover the roles of miRNAs and lncRNAs in breast cancer subtyping and prognosis from the systematic comparison. In addition, we created an R package, CancerSubtypesPrognosis, including all the 35 methods to facilitate the reproducibility of the methods and streamline the evaluation. The experimental results show that integrating miRNA expression data helps improve the performance of the mRNA-based cancer subtyping methods. However, miRNA signatures are not as good as mRNA signatures for breast cancer prognosis. In general, lncRNA expression data does not help improve the mRNA-based methods in both cancer subtyping and cancer prognosis. These results suggest that the prognostic roles of miRNA/lncRNA signatures in the improvement of breast cancer prognosis needs to be further verified.

Publication

A framework for reputation bootstrapping based on reputation utility and game theories

Publisher: IEEE

Date: 11-2011

DOI: 10.1109/TRUSTCOM.2011.45

Publication

An approximate microaggregation approach for microdata protection

Publisher: Elsevier BV

Date: 02-2012

DOI: 10.1016/J.ESWA.2011.04.223

Publication

PSR J2322−2650 – a low-luminosity millisecond pulsar with a planetary-mass companion

Publisher: Oxford University Press (OUP)

Date: 07-12-2017

DOI: 10.1093/MNRAS/STX3157

Publication

Phytophthora Database 2.0: Update and future direction

Publisher: Scientific Societies

Date: 12-2013

DOI: 10.1094/PHYTO-01-13-0023-R

Abstract: The online community resource Phytophthora database (PD) was developed to support accurate and rapid identification of Phytophthora and to help characterize and catalog the ersity and evolutionary relationships within the genus. Since its release in 2008, the sequence database has grown to cover 1 to 12 loci for ≈2,600 isolates (representing 138 described and provisional species). Sequences of multiple mitochondrial loci were added to complement nuclear loci-based phylogenetic analyses and diagnostic tool development. Key characteristics of most newly described and provisional species have been summarized. Other additions to improve the PD functionality include: (i) geographic information system tools that enable users to visualize the geographic origins of chosen isolates on a global-scale map, (ii) a tool for comparing genetic similarity between isolates via microsatellite markers to support population genetic studies, (iii) a comprehensive review of molecular diagnostics tools and relevant references, (iv) sequence alignments used to develop polymerase chain reaction-based diagnostics tools to support their utilization and new diagnostic tool development, and (v) an online community forum for sharing and preserving experience and knowledge accumulated in the global Phytophthora community. Here we present how these improvements can support users and discuss the PD's future direction.

Publication

Authenticity and credibility aware detection of adverse drug events from social media

Publisher: Elsevier BV

Date: 12-2018

DOI: 10.1016/J.IJMEDINF.2018.10.003

Abstract: Adverse drug events (ADEs) are among the top causes of hospitalization and death. Social media is a promising open data source for the timely detection of potential ADEs. In this paper, we study the problem of detecting signals of ADEs from social media. Detecting ADEs whose drug and AE may be reported in different posts of a user leads to major concerns regarding the content authenticity and user credibility, which have not been addressed in previous studies. Content authenticity concerns whether a post mentions drugs or adverse events that are actually consumed or experienced by the writer. User credibility indicates the degree to which chronological evidence from a user's sequence of posts should be trusted in the ADE detection. We propose AC-SPASM, a Bayesian model for the authenticity and credibility aware detection of ADEs from social media. The model exploits the interaction between content authenticity, user credibility and ADE signal quality. In particular, we argue that the credibility of a user correlates with the user's consistency in reporting authentic content. We conduct experiments on a real-world Twitter dataset containing 1.2 million posts from 13,178 users. Our benchmark set contains 22 drugs and 8089 AEs. AC-SPASM recognizes authentic posts with F Our study demonstrates that taking into account the content authenticity and user credibility improves the detection of ADEs from social media. Our work generates hypotheses to reduce experts' guesswork in identifying unknown potential ADEs.

Publication

Portable devices of security and privacy preservation for e-learning

Publisher: IEEE

Date: 04-2008

DOI: 10.1109/CSCWD.2008.4537121

Publication

Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction

Publisher: Springer Science and Business Media LLC

Date: 15-03-2019

DOI: 10.1186/S12859-019-2668-X

Publication

STMM: Semantic and Temporal-Aware Markov Chain Model for Mobility Prediction

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-24474-7_15

Publication

Adaptive Skeleton Construction for Accurate DAG Learning

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2023

DOI: 10.1109/TKDE.2023.3265015

Publication

CBNA: A control theory based method for identifying coding and non-coding cancer drivers

Publisher: Public Library of Science (PLoS)

Date: 02-12-2019

DOI: 10.1371/JOURNAL.PCBI.1007538

Publication

Computational methods for identifying miRNA sponge interactions

Publisher: Oxford University Press (OUP)

Date: 05-06-2016

DOI: 10.1093/BIB/BBW042

Abstract: Recent findings show that coding genes are not the only targets that miRNAs interact with. In fact, there is a pool of different RNAs competing with each other to attract miRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The ceRNAs indirectly regulate each other via the titration mechanism, i.e. the increasing concentration of a ceRNA will decrease the number of miRNAs that are available for interacting with other targets. The cross-talks between ceRNAs, i.e. their interactions mediated by miRNAs, have been identified as the drivers in many disease conditions, including cancers. In recent years, some computational methods have emerged for identifying ceRNA-ceRNA interactions. However, there remain great challenges and opportunities for developing computational methods to provide new insights into ceRNA regulatory mechanisms.In this paper, we review the publically available databases of ceRNA-ceRNA interactions and the computational methods for identifying ceRNA-ceRNA interactions (also known as miRNA sponge interactions). We also conduct a comparison study of the methods with a breast cancer dataset. Our aim is to provide a current snapshot of the advances of the computational methods in identifying miRNA sponge interactions and to discuss the remaining challenges.

Publication

Achieving P-Sensitive K-Anonymity via Anatomy

Publisher: IEEE

Date: 2009

DOI: 10.1109/ICEBE.2009.34

Publication

pDriver : A novel method for unravelling personalised coding and miRNA cancer drivers.

Publisher: Oxford University Press (OUP)

Date: 27-04-2021

DOI: 10.1093/BIOINFORMATICS/BTAB262

Abstract: Unravelling cancer driver genes is important in cancer research. Although computational methods have been developed to identify cancer drivers, most of them detect cancer drivers at population level. However, two patients who have the same cancer type and receive the same treatment may have different outcomes because each patient has a different genome and their disease might be driven by different driver genes. Therefore new methods are being developed for discovering cancer drivers at in idual level, but existing personalized methods only focus on coding drivers while microRNAs (miRNAs) have been shown to drive cancer progression as well. Thus, novel methods are required to discover both coding and miRNA cancer drivers at in idual level. We propose the novel method, pDriver, to discover personalized cancer drivers. pDriver includes two stages: (i) constructing gene networks for each cancer patient and (ii) discovering cancer drivers for each patient based on the constructed gene networks. To demonstrate the effectiveness of pDriver, we have applied it to five TCGA cancer datasets and compared it with the state-of-the-art methods. The result indicates that pDriver is more effective than other methods. Furthermore, pDriver can also detect miRNA cancer drivers and most of them have been confirmed to be associated with cancer by literature. We further analyze the predicted personalized drivers for breast cancer patients and the result shows that they are significantly enriched in many GO processes and KEGG pathways involved in breast cancer. pDriver is available at vvhoang Driver. Supplementary data are available at Bioinformatics online.

Publication

Discovering Functional microRNA-mRNA Regulatory Modules in Heterogeneous Data

Publisher: Springer Netherlands

Date: 19-12-2012

DOI: 10.1007/978-94-007-5590-1_14

Abstract: microRNAs (miRNAs) are small non-coding RNAs that cause mRNA degradation and translation inhibition. They are pivotal regulators of development and cellular homeostasis through their control of erse processes. Recently, great efforts have been made to elucidate many targets that are affected by miRNAs, but the functions of most miRNAs and their precise regulatory mechanisms remain elusive. With more and more matched expression profiles of miRNAs and mRNAs having been made available, it is of great interest to utilize both expression profiles and sequence information to discover the functional regulatory networks of miRNAs and their target mRNAs for potential biological processes that they may participate in. In this chapter, we first briefly review the computational methods for discovering miRNA targets and miRNA-mRNA regulatory modules, and then focus on a method of identifying functional miRNA-mRNA regulatory modules by integrating multiple data sets from different sources.

Publication

Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-02922-7_25

Publication

A general framework for causal classification

Publisher: Springer Science and Business Media LLC

Date: 03-2021

DOI: 10.1007/S41060-021-00249-1

Publication

Construct robust rule sets for classification

Publisher: ACM

Date: 23-07-2002

DOI: 10.1145/775047.775130

Publication

Learning Causal Representations for Robust Domain Adaptation

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2021

DOI: 10.1109/TKDE.2021.3119185

Publication

Effective Pruning for the Discovery of Conditional Functional Dependencies

Publisher: Oxford University Press (OUP)

Date: 24-06-2012

DOI: 10.1093/COMJNL/BXS082

Publication

Nonparametric Sparse Matrix Decomposition for Cross-View Dimensionality Reduction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2017

DOI: 10.1109/TMM.2017.2683258

Publication

FastOPM—A practical method for partial match of time series

Publisher: Elsevier BV

Date: 10-2022

DOI: 10.1016/J.PATCOG.2022.108808

Publication

Time to infer miRNA sponge modules

Publisher: Wiley

Date: 03-08-2021

DOI: 10.1002/WRNA.1686

Abstract: Inferring competing endogenous RNA (ceRNA) or microRNA (miRNA) sponge modules is a challenging and meaningful task for revealing ceRNA regulation mechanism at the module level. Modules in this context refer to groups of miRNA sponges which have mutual competitions and act as functional units for achieving biological processes. The recent development of computational methods based on heterogeneous data provides a novel way to discern the competitive effects of miRNA sponges on human complex diseases. This article aims to provide a comprehensive perspective of miRNA sponge module discovery methods. We first review the publicly available databases of cancer‐related miRNA sponges, as the miRNA sponges involved in human cancers contribute to the discovery of cancer‐associated modules. Then we review the existing computational methods for inferring miRNA sponge modules. Furthermore, we conduct an assessment on the performance of the module discovery methods with the pan‐cancer dataset, and the comparison study indicates that it is useful to infer biologically meaningful miRNA sponge modules by directly mapping heterogeneous data to the competitive modules. Finally, we discuss the future directions and associated challenges in developing in silico methods to infer miRNA sponge modules. This article is categorized under: RNA Interactions with Proteins and Other Molecules Small Molecule‐RNA Interactions Regulatory RNAs/RNAi/Riboswitches Regulatory RNAs

Publication

A sub-national economic complexity analysis of Australia’s states and territories

Publisher: Informa UK Limited

Date: 16-03-2018

DOI: 10.1080/00343404.2017.1283012

Publication

Motif Discovery and Phylogenetic Analysis of Hepatitis B Virus Sequences

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-29305-4_332

Publication

miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase

Publisher: Springer Science and Business Media LLC

Date: 12-2018

DOI: 10.1186/S12859-018-2531-5

Publication

Should Learning Analytics Models Include Sensitive Attributes? Explaining the Why

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2023

DOI: 10.1109/TLT.2022.3226474

Publication

Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2008

DOI: 10.1109/TKDE.2008.52

Publication

Secure Outsourced Frequent Pattern Mining by Fully Homomorphic Encryption

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-22729-0_6

Publication

Efficient discovery of risk patterns in medical data

Publisher: Elsevier BV

Date: 2009

DOI: 10.1016/J.ARTMED.2008.07.008

Abstract: This paper studies a problem of efficiently discovering risk patterns in medical data. Risk patterns are defined by a statistical metric, relative risk, which has been widely used in epidemiological research. To avoid fruitless search in the complete exploration of risk patterns, we define optimal risk pattern set to exclude superfluous patterns, i.e. complicated patterns with lower relative risk than their corresponding simpler form patterns. We prove that mining optimal risk pattern sets conforms an anti-monotone property that supports an efficient mining algorithm. We propose an efficient algorithm for mining optimal risk pattern sets based on this property. We also propose a hierarchical structure to present discovered patterns for the easy perusal by domain experts. The proposed approach is compared with two well-known rule discovery methods, decision tree and association rule mining approaches on benchmark data sets and applied to a real world application. The proposed method discovers more and better quality risk patterns than a decision tree approach. The decision tree method is not designed for such applications and is inadequate for pattern exploring. The proposed method does not discover a large number of uninteresting superfluous patterns as an association mining approach does. The proposed method is more efficient than an association rule mining method. A real world case study shows that the method reveals some interesting risk patterns to medical practitioners. The proposed method is an efficient approach to explore risk patterns. It quickly identifies cohorts of patients that are vulnerable to a risk outcome from a large data set. The proposed method is useful for exploratory study on large medical data to generate and refine hypotheses. The method is also useful for designing medical surveillance systems.

Publication

Satisfying Privacy Requirements: One Step before Anonymization

Publisher: Springer Berlin Heidelberg

Date: 2010

DOI: 10.1007/978-3-642-13657-3_21

Publication

Stabilising Job Survival Analysis for Disability Employment Services in Unseen Environments

Publisher: ACM

Date: 04-08-2023

DOI: 10.1145/3580305.3599908

Publication

Ensemble Methods for MiRNA Target Prediction from Expression Data

Publisher: Public Library of Science (PLoS)

Date: 26-06-2015

DOI: 10.1371/JOURNAL.PONE.0131627

Publication

miRspongeR 2.0: an enhanced R package for exploring miRNA sponge regulation

Publisher: Oxford University Press (OUP)

Date: 2022

DOI: 10.1093/BIOADV/VBAC063

Abstract: MicroRNA (miRNA) sponges influence the capability of miRNA-mediated gene silencing by competing for shared miRNA response elements and play significant roles in many physiological and pathological processes. It has been proved that computational or dry-lab approaches are useful to guide wet-lab experiments for uncovering miRNA sponge regulation. However, all of the existing tools only allow the analysis of miRNA sponge regulation regarding a group of s les, rather than the miRNA sponge regulation unique to in idual s les. Furthermore, most existing tools do not allow parallel computing for the fast identification of miRNA sponge regulation. Here, we present an enhanced version of our R/Bioconductor package, miRspongeR 2.0. Compared with the original version introduced in 2019, this package extends the resolution of miRNA sponge regulation from the multi-s le level to the single-s le level. Moreover, it supports the identification of miRNA sponge networks using parallel computing, and the construction of s le–s le correlation networks. It also provides more computational methods to infer miRNA sponge regulation and expands the ground truth for validation. With these new features, we anticipate that miRspongeR 2.0 will further accelerate the research on miRNA sponges with higher resolution and more utilities. ackages/miRspongeR/. Supplementary data are available at Bioinformatics Advances online.

Publication

Radio light curve of the galaxy possibly associated with FRB 150418

Publisher: Oxford University Press (OUP)

Date: 02-11-2016

DOI: 10.1093/MNRAS/STW2808

Publication

From Observational Studies to Causal Rule Mining

Publisher: Association for Computing Machinery (ACM)

Date: 24-11-2015

DOI: 10.1145/2746410

Abstract: Randomised controlled trials (RCTs) are the most effective approach to causal discovery, but in many circumstances it is impossible to conduct RCTs. Therefore, observational studies based on passively observed data are widely accepted as an alternative to RCTs. However, in observational studies, prior knowledge is required to generate the hypotheses about the cause-effect relationships to be tested, and hence they can only be applied to problems with available domain knowledge and a handful of variables. In practice, many datasets are of high dimensionality, which leaves observational studies out of the opportunities for causal discovery from such a wealth of data sources. In another direction, many efficient data mining methods have been developed to identify associations among variables in large datasets. The problem is that causal relationships imply associations, but the reverse is not always true. However, we can see the synergy between the two paradigms here. Specifically, association rule mining can be used to deal with the high-dimensionality problem, whereas observational studies can be utilised to eliminate noncausal associations. In this article, we propose the concept of causal rules (CRs) and develop an algorithm for mining CRs in large datasets. We use the idea of retrospective cohort studies to detect CRs based on the results of association rule mining. Experiments with both synthetic and real-world datasets have demonstrated the effectiveness and efficiency of CR mining. In comparison with the commonly used causal discovery methods, the proposed approach generally is faster and has better or competitive performance in finding correct or sensible causes. It is also capable of finding a cause consisting of multiple variables—a feature that other causal discovery methods do not possess.

Publication

Privacy Protection for Genomic Data: Current Techniques and Challenges

Publisher: Springer Berlin Heidelberg

Date: 2010

DOI: 10.1007/978-3-642-05183-8_7

Publication

Using multiple and negative target rules to make classifiers more understandable

Publisher: Elsevier BV

Date: 10-2006

DOI: 10.1016/J.KNOSYS.2006.03.003

Publication

Training Neural Networks with Random Noise Images for Adversarial Robustness

Publisher: ACM

Date: 26-10-2021

DOI: 10.1145/3459637.3482205

Publication

LoPAD: A Local Prediction Approach to Anomaly Detection

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-47436-2_50

Publication

From spin noise to systematics: stochastic processes in the first International Pulsar Timing Array data release

Publisher: Oxford University Press (OUP)

Date: 19-02-2016

DOI: 10.1093/MNRAS/STW395

Publication

Decision Support for Disability Employment using Counterfactual Survival Analysis

Publisher: IEEE

Date: 17-12-2022

DOI: 10.1109/BIGDATA55660.2022.10021126

Publication

On discovery of functional dependencies from data

Publisher: Elsevier BV

Date: 07-2013

DOI: 10.1016/J.DATAK.2013.01.008

Publication

Data-driven discovery of causal interactions

Publisher: Springer Science and Business Media LLC

Date: 12-01-2019

DOI: 10.1007/S41060-018-0168-0

Publication

Identifying direct miRNA–mRNA causal regulatory relationships in heterogeneous data

Publisher: Elsevier BV

Date: 12-2014

DOI: 10.1016/J.JBI.2014.08.005

Abstract: Discovering the regulatory relationships between microRNAs (miRNAs) and mRNAs is an important problem that interests many biologists and medical researchers. A number of computational methods have been proposed to infer miRNA-mRNA regulatory relationships, and are mostly based on the statistical associations between miRNAs and mRNAs discovered in observational data. The miRNA-mRNA regulatory relationships identified by these methods can be both direct and indirect regulations. However, differentiating direct regulatory relationships from indirect ones is important for biologists in experimental designs. In this paper, we present a causal discovery based framework (called DirectTarget) to infer direct miRNA-mRNA causal regulatory relationships in heterogeneous data, including expression profiles of miRNAs and mRNAs, and miRNA target information. DirectTarget is applied to the Epithelial to Mesenchymal Transition (EMT) datasets. The validation by experimentally confirmed target databases suggests that the proposed method can effectively identify direct miRNA-mRNA regulatory relationships. To explore the upstream regulators of miRNA regulation, we further identify the causal feedforward patterns (CFFPs) of TF-miRNA-mRNA to provide insights into the miRNA regulation in EMT. DirectTarget has the potential to be applied to other datasets to elucidate the direct miRNA-mRNA causal regulatory relationships and to explore the regulatory patterns.

Publication

Mining Informative Rule Set for Prediction

Publisher: Springer Science and Business Media LLC

Date: 03-2004

DOI: 10.1023/B:JIIS.0000012468.25883.A5

Publication

From Association Analysis to Causal Discovery

Publisher: ACM

Date: 02-12-2013

DOI: 10.1145/2542652.2542659

Publication

Validating Privacy Requirements in Large Survey Rating Data

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-20344-2_17

Publication

Randomize Adversarial Defense in a Light Way

Publisher: IEEE

Date: 17-12-2022

DOI: 10.1109/BIGDATA55660.2022.10020163

Publication

The winning methods for predicting cellular position in the DREAM single-cell transcriptomics challenge

Publisher: Oxford University Press (OUP)

Date: 25-08-2021

DOI: 10.1093/BIB/BBAA181

Abstract: Predicting cell locations is important since with the understanding of cell locations, we may estimate the function of cells and their integration with the spatial environment. Thus, the DREAM challenge on single-cell transcriptomics required participants to predict the locations of single cells in the Drosophila embryo using single-cell transcriptomic data. We have developed over 50 pipelines by combining different ways of preprocessing the RNA-seq data, selecting the genes, predicting the cell locations and validating predicted cell locations, resulting in the winning methods which were ranked second in sub-challenge 1, first in sub-challenge 2 and third in sub-challenge 3. In this paper, we present an R package, SCTCwhatateam, which includes all the methods we developed and the Shiny web application to facilitate the research on single-cell spatial reconstruction. All the data and the ex le use cases are available in the Supplementary data.

Publication

A novel framework for inferring condition-specific TF and miRNA co-regulation of protein–protein interactions

Publisher: Elsevier BV

Date: 02-2016

DOI: 10.1016/J.GENE.2015.11.023

Abstract: Recent studies have shown that transcription factors (TFs) and microRNAs (miRNAs), while independently regulate their downstream targets, collaborate with each other to regulate gene expression. However, their synergistic roles in protein-protein interactions (PPIs) remain mostly unknown. In this paper, we present a novel framework (called CoRePPI) for inferring TF and miRNA co-regulation of PPIs. Particularly, CoRePPI is aimed at discovering the co-regulation specific to a condition of interest, by using heterogeneous data, including miRNA and messenger RNA (mRNA) expression profiles, putative miRNA targets, TF targets and PPIs. CoRePPI firstly finds the network motifs indicating the co-regulation of PPIs by TFs and miRNAs in tumor and normal conditions separately. Then by identifying the differential motifs found in one condition but not in the other, it builds the networks consisting of TFs, miRNAs and their co-regulated PPIs specific to different conditions respectively. To validate CoRePPI, we apply it to the Pan-Cancer dataset which includes the expression profiles of 12 cancer types from TCGA. Through network topology analysis, we found that the tumor and normal CoRePPI networks are scale-free. Furthermore, the results of differential and intersected network analysis between the tumor and normal CoRePPI networks suggest that only a small fraction of the regulatory relationships between TFs and miRNAs are conserved in both conditions but they co-regulate different downstream PPIs in tumor and normal conditions and in different conditions the majority of the regulatory relationships between TFs and miRNAs are different although they may regulate the same PPIs in their respective conditions. The CoRePPI sub-networks constructed for the three types of cancers (breast cancer, lung cancer and ovarian cancer) are all scale-free, and the intersection of these CoRePPI sub-networks can be utilized as the biomarker CoRePPI sub-network of the three types of cancers. The PPI enrichment analyses of the tumor and normal CoRePPI networks suggest that the co-regulating TFs and miRNAs are significantly associated with the specific biological processes, diseases and pathways. In addition, comparing with the two non-condition-specific approaches, the tumor CoRePPI network is found to have the most enriched cancer-related PPIs. Altogether, the results uncover the combined regulatory patterns of TFs and miRNAs on the PPIs, and may provide new insights for research in cancer-associated TFs and miRNAs.

Publication

Integrating Global and Local Feature Selection for Multi-Label Learning

Publisher: Association for Computing Machinery (ACM)

Date: 20-02-2023

DOI: 10.1145/3532190

Abstract: Multi-label learning deals with the problem where an instance is associated with multiple labels simultaneously. Multi-label data is often of high dimensionality and has many noisy, irrelevant, and redundant features. As an important machine learning task, multi-label feature selection has received considerable attention in recent years due to its promising performance in dealing with high-dimensional multi-label data. Existing multi-label feature selection methods typically select the global features which are shared by all instances in a dataset. However, these multi-label feature selection methods may be suboptimal since they do not consider the specific characteristics of instances. In this paper, we propose a novel algorithm that integrates Global and Local Feature Selection (GLFS) to exploit both the global features and a subset of discriminative features shared only locally by a subgroup of instances in a multi-label dataset. Specifically, GLFS employs linear regression and ℓ 2,1 -norm on the regression parameters to achieve simultaneous global and local feature selection. Moreover, the proposed algorithm has an effective mechanism for utilizing label correlations to improve the feature selection. Experiments on real-world multi-label datasets show the superiority of GLFS over the state-of-the-art multi-label feature selection methods.

Publication

Trends and Applications in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-25660-3

Publication

Injecting purpose and trust into data anonymisation

Publisher: ACM

Date: 02-11-2009

DOI: 10.1145/1645953.1646166

Publication

A pseudotemporal causality approach to identifying miRNA–mRNA interactions during biological processes

Publisher: Oxford University Press (OUP)

Date: 18-10-2021

DOI: 10.1093/BIOINFORMATICS/BTAA899

Abstract: microRNAs (miRNAs) are important gene regulators and they are involved in many biological processes, including cancer progression. Therefore, correctly identifying miRNA–mRNA interactions is a crucial task. To this end, a huge number of computational methods has been developed, but they mainly use the data at one snapshot and ignore the dynamics of a biological process. The recent development of single cell data and the booming of the exploration of cell trajectories using ‘pseudotime’ concept have inspired us to develop a pseudotime-based method to infer the miRNA–mRNA relationships characterizing a biological process by taking into account the temporal aspect of the process. We have developed a novel approach, called pseudotime causality, to find the causal relationships between miRNAs and mRNAs during a biological process. We have applied the proposed method to both single cell and bulk sequencing datasets for Epithelia to Mesenchymal Transition, a key process in cancer metastasis. The evaluation results show that our method significantly outperforms existing methods in finding miRNA–mRNA interactions in both single cell and bulk data. The results suggest that utilizing the pseudotemporal information from the data helps reveal the gene regulation in a biological process much better than using the static information. R scripts and datasets can be found at github.com/AndresMCB/PTC. Supplementary data are available at Bioinformatics online.

Publication

Discovery of functional miRNA–mRNA regulatory modules with computational methods

Publisher: Elsevier BV

Date: 08-2009

DOI: 10.1016/J.JBI.2009.01.005

Abstract: The identification of miRNAs and their target mRNAs and the construction of their regulatory networks may give new insights into biological procedures. This study proposes a computational method to discover the functional miRNA-mRNA regulatory modules (FMRMs), that is, groups of miRNAs and their target mRNAs that are believed to participate cooperatively in post-transcriptional gene regulation under specific conditions. The proposed method identifies negatively regulated patterns of miRNAs and mRNAs which associate with cancer and normal conditions, respectively, in a prostate cancer data set. GO and the literature also suggest that they may relate with prostate cancer. It can potentially identify the biologically relevant chains of 'miRNA-->target gene --> condition'.

Publication

Effective Outlier Detection based on Bayesian Network and Proximity

Publisher: IEEE

Date: 12-2018

DOI: 10.1109/BIGDATA.2018.8622230

Publication

Recommending the Most Effective Intervention to Improve Employment for Job Seekers with Disability

Publisher: ACM

Date: 14-08-2021

DOI: 10.1145/3447548.3467095

Publication

LMSM: A modular approach for identifying lncRNA related miRNA sponge modules in breast cancer

Publisher: Public Library of Science (PLoS)

Date: 23-04-2020

DOI: 10.1371/JOURNAL.PCBI.1007851

Publication

Authorization approaches for advanced permission-role assignments

Publisher: IEEE

Date: 04-2008

DOI: 10.1109/CSCWD.2008.4536994

Publication

Utility Aware Clustering for Publishing Transactional Data

Publisher: Springer International Publishing

Date: 2017

DOI: 10.1007/978-3-319-57529-2_38

Publication

The High Time Resolution Universe Pulsar Survey – XIII. PSR J1757−1854, the most accelerated binary pulsar

Publisher: Oxford University Press (OUP)

Date: 09-01-2018

DOI: 10.1093/MNRASL/SLY003

Publication

Five new fast radio bursts from the HTRU high-latitude survey at Parkes: first evidence for two-component bursts

Publisher: Oxford University Press (OUP)

Date: 11-04-2016

DOI: 10.1093/MNRASL/SLW069

Publication

Toward Unique and Unbiased Causal Effect Estimation From Data With Hidden Variables

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2023

DOI: 10.1109/TNNLS.2021.3133337

Publication

Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives

Publisher: Springer Science and Business Media LLC

Date: 17-06-2020

DOI: 10.1038/S41467-020-16829-X

Abstract: Polygenic risk scores are emerging as a potentially powerful tool to predict future phenotypes of target in iduals, typically using unrelated in iduals, thereby devaluing information from relatives. Here, for 50 traits from the UK Biobank data, we show that a design of 5,000 in iduals with first-degree relatives of target in iduals can achieve a prediction accuracy similar to that of around 220,000 unrelated in iduals (mean prediction accuracy = 0.26 vs. 0.24, mean fold-change = 1.06 (95% CI: 0.99-1.13), P-value = 0.08), despite a 44-fold difference in s le size. For lifestyle traits, the prediction accuracy with 5,000 in iduals including first-degree relatives of target in iduals is significantly higher than that with 220,000 unrelated in iduals (mean prediction accuracy = 0.22 vs. 0.16, mean fold-change = 1.40 (1.17-1.62), P-value = 0.025). Our findings suggest that polygenic prediction integrating family information may help to accelerate precision health and clinical intervention.

Publication

Peculiar spin frequency and radio profile evolution of PSR J1119−6127 following magnetar-like X-ray bursts

Publisher: Oxford University Press (OUP)

Date: 08-2018

DOI: 10.1093/MNRAS/STY2063

Publication

The Sardinia Radio Telescope

Publisher: EDP Sciences

Date: 12-2017

DOI: 10.1051/0004-6361/201630243

Publication

The High Time Resolution Universe Pulsar Survey – XV. Completion of the intermediate-latitude survey with the discovery and timing of 25 further pulsars

Publisher: Oxford University Press (OUP)

Date: 11-02-2019

DOI: 10.1093/MNRAS/STZ401

Publication

miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data

Publisher: Informa UK Limited

Date: 06-04-2021

DOI: 10.1080/15476286.2021.1905341

Publication

Sufficient dimension reduction for average causal effect estimation

Publisher: Springer Science and Business Media LLC

Date: 20-04-2022

DOI: 10.1007/S10618-022-00832-5

Abstract: A large number of covariates can have a negative impact on the quality of causal effect estimation since confounding adjustment becomes unreliable when the number of covariates is large relative to the number of s les. Propensity score is a common way to deal with a large covariate set, but the accuracy of propensity score estimation (normally done by logistic regression) is also challenged by the large number of covariates. In this paper, we prove that a large covariate set can be reduced to a lower dimensional representation which captures the complete information for adjustment in causal effect estimation. The theoretical result enables effective data-driven algorithms for causal effect estimation. Supported by the result, we develop an algorithm that employs a supervised kernel dimension reduction method to learn a lower dimensional representation from the original covariate space, and then utilises nearest neighbour matching in the reduced covariate space to impute the counterfactual outcomes to avoid the large sized covariate set problem. The proposed algorithm is evaluated on two semisynthetic and three real-world datasets and the results show the effectiveness of the proposed algorithm.

Publication

Discover dependencies from data - A review

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 02-2012

DOI: 10.1109/TKDE.2010.197

Publication

A relative privacy model for effective privacy preservation in transactional data

Publisher: Wiley

Date: 19-09-2018

DOI: 10.1002/CPE.4923

Publication

Supervised signal detection for adverse drug reactions in medication dispensing data

Publisher: Elsevier BV

Date: 07-2018

DOI: 10.1016/J.CMPB.2018.03.021

Abstract: Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality and thus should be detected early to reduce consequences on health outcomes. Medication dispensing data are comprehensive sources of information about medicine uses that can be utilized for the signal detection of ADRs. Sequence symmetry analysis (SSA) has been employed in previous studies to detect signals of ADRs from medication dispensing data, but it has a moderate sensitivity and tends to miss some ADR signals. With successful applications in various areas, supervised machine learning (SML) methods are promising in detecting ADR signals. Gold standards of known ADRs and non- ADRs from previous studies create opportunities to take into account additional domain knowledge to improve ADR signal detection with SML. We assess the utility of SML as a signal detection tool for ADRs in medication dispensing data with the consideration of domain knowledge from DrugBank and MedDRA. We compare the best performing SML method with SSA. We model the ADR signal detection problem as a supervised machine learning problem by linking medication dispensing data with domain knowledge bases. Suspected ADR signals are extracted from the Australian Pharmaceutical Benefit Scheme (PBS) medication dispensing data from 2013 to 2016. We construct predictive features for each signal candidate based on its occurrences in medication dispensing data as well as its pharmacological properties. Pharmaceutical knowledge bases including DrugBank and MedDRA are employed to provide pharmacological features for a signal candidate. Given a gold standard of known ADRs and non-ADRs, SML learns to differentiate between known ADRs and non-ADRs based on their combined predictive features from linked sources, and then predicts whether a new case is a potential ADR signal. We evaluate the performance of six widely used SML methods with two gold standards of known ADRs and non-ADRs from previous studies. On average, gradient boosting classifier achieves the sensitivity of 77%, specificity of 81%, positive predictive value of 76%, negative predictive value of 82%, area under precision-recall curve of 81%, and area under receiver operating characteristic curve of 82%, most of which are higher than in other SML methods. In particular, gradient boosting classifier has 21% higher sensitivity than and comparable specificity with SSA. Furthermore, gradient boosting classifier detects 10% more unknown potential ADR signals than SSA. Our study demonstrates that gradient boosting classifier is a promising supervised signal detection tool for ADRs in medication dispensing data to complement SSA.

Publication

A novel single-cell based method for breast cancer prognosis

Publisher: Public Library of Science (PLoS)

Date: 24-08-2020

DOI: 10.1371/JOURNAL.PCBI.1008133

Publication

Inferring functional miRNA–mRNA regulatory modules in epithelial–mesenchymal transition with a probabilistic topic model

Publisher: Elsevier BV

Date: 04-2012

DOI: 10.1016/J.COMPBIOMED.2011.12.011

Abstract: MicroRNAs (miRNAs) play important roles in gene regulatory networks. In this paper, we propose a probabilistic topic model to infer regulatory networks of miRNAs and their target mRNAs for specific biological conditions at the post-transcriptional level, so-called functional miRNA-mRNA regulatory modules (FMRMs). The probabilistic model used in this paper can effectively capture the relationship between miRNAs and mRNAs in specific cellular conditions. Furthermore, the proposed method identifies negatively and positively correlated miRNA-mRNA pairs which are associated with epithelial, mesenchymal, and other condition in EMT (epithelial-mesenchymal transition) data set, respectively. Results on EMT data sets show that the inferred FMRMs can potentially construct the biological chain of 'miRNA→mRNA→condition' at the post-transcriptional level.

Publication

Computational Methods for Predicting Autism Spectrum Disorder from Gene Expression Data

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-65390-3_31

Publication

Investigating the Impact of Using IR Bands on Early Fire Smoke Detection from Landsat Imagery with a Lightweight CNN Model

Publisher: MDPI AG

Date: 25-06-2022

DOI: 10.3390/RS14133047

Abstract: Smoke plumes are the first things seen from space when wildfires occur. Thus, fire smoke detection is important for early fire detection. Deep Learning (DL) models have been used to detect fire smoke in satellite imagery for fire detection. However, previous DL-based research only considered lower spatial resolution sensors (e.g., Moderate-Resolution Imaging Spectroradiometer (MODIS)) and only used the visible (i.e., red, green, blue (RGB)) bands. To contribute towards solutions for early fire smoke detection, we constructed a six-band imagery dataset from Landsat 5 Thematic Mapper (TM) and Landsat 8 Operational Land Imager (OLI) with a 30-metre spatial resolution. The dataset consists of 1836 images in three classes, namely “Smoke”, “Clear”, and “Other_aerosol”. To prepare for potential on-board-of-small-satellite detection, we designed a lightweight Convolutional Neural Network (CNN) model named “Variant Input Bands for Smoke Detection (VIB_SD)”, which achieved competitive accuracy with the state-of-the-art model SAFA, with less than 2% of its number of parameters. We further investigated the impact of using additional Infra-Red (IR) bands on the accuracy of fire smoke detection with VIB_SD by training it with five different band combinations. The results demonstrated that adding the Near-Infra-Red (NIR) band improved prediction accuracy compared with only using the visible bands. Adding both Short-Wave Infra-Red (SWIR) bands can further improve the model performance compared with adding only one SWIR band. The case study showed that the model trained with multispectral bands could effectively detect fire smoke mixed with cloud over small geographic extents.

Publication

Opportunistic mining of top-n high utility patterns

Publisher: Elsevier BV

Date: 05-2018

DOI: 10.1016/J.INS.2018.02.035

Publication

A relative privacy model for effective privacy preservation in transactional data

Publisher: IEEE

Date: 08-2017

DOI: 10.1109/TRUSTCOM/BIGDATASE/ICESS.2017.263

Publication

Inferring miRNA sponge co-regulation of protein-protein interactions in human breast cancer

Publisher: Springer Science and Business Media LLC

Date: 08-05-2017

DOI: 10.1186/S12859-017-1672-2

Publication

Predicting miRNA Targets by Integrating Gene Regulatory Knowledge with Expression Profiles

Publisher: Public Library of Science (PLoS)

Date: 11-04-2016

DOI: 10.1371/JOURNAL.PONE.0152860

Publication

The SUrvey for Pulsars and Extragalactic Radio Bursts – I. Survey description and overview

Publisher: Oxford University Press (OUP)

Date: 17-08-2017

DOI: 10.1093/MNRAS/STX2126

Publication

Causal heterogeneity discovery by bottom-up pattern search for personalised decision making

Publisher: Springer Science and Business Media LLC

Date: 02-08-2023

DOI: 10.1007/S10489-022-03860-2

Abstract: In personalised decision making, evidence is required to determine whether an action (treatment) is suitable for an in idual. Such evidence can be obtained by modelling treatment effect heterogeneity in subgroups. The existing interpretable modelling methods take a top-down approach to search for subgroups with heterogeneous treatment effects and they may miss the most specific and relevant context for an in idual. In this paper, we design a Treatment effect pattern (TEP) to represent treatment effect heterogeneity in data. To achieve an interpretable presentation of TEPs, we use a local causal structure around the outcome to explicitly show how those important variables are used in modelling. We also derive a formula for unbiasedly estimating the Conditional Average Causal Effect (CATE) using the local structure in our problem setting. In the discovery process, we aim at minimising heterogeneity within each subgroup represented by a pattern. We propose a bottom-up search algorithm to discover the most specific patterns fitting in idual circumstances the best for personalised decision making. Experiments show that the proposed method models treatment effect heterogeneity better than three other existing tree based methods in synthetic and real world data sets.

Publication

Divide and Conquer: Targeted Adversary Detection using Proximity and Dependency

Publisher: IEEE

Date: 12-2021

DOI: 10.1109/ICKG52313.2021.00026

Publication

Information based data anonymization for classification utility

Publisher: Elsevier BV

Date: 12-2011

DOI: 10.1016/J.DATAK.2011.07.001

Publication

Research note: cost-efficient estimates of Pinus radiata wood volumes using multitemporal LiDAR data

Publisher: Informa UK Limited

Date: 02-10-2021

DOI: 10.1080/00049158.2021.1997459

Publication

Inferring microRNA and transcription factor regulatory networks in heterogeneous data

Publisher: Springer Science and Business Media LLC

Date: 11-03-2013

DOI: 10.1186/1471-2105-14-92

Publication

Detecting potential signals of adverse drug events from prescription data

Publisher: Elsevier BV

Date: 04-2020

DOI: 10.1016/J.ARTMED.2020.101839

Publication

Phylogenomic Analysis of a 55.1-kb 19-Gene Dataset Resolves a Monophyletic

Publisher: Scientific Societies

Date: 07-2021

DOI: 10.1094/PHYTO-08-20-0330-LE

Abstract: Scientific communication is facilitated by a data-driven, scientifically sound taxonomy that considers the end-user’s needs and established successful practice. In 2013, the Fusarium community voiced near unanimous support for a concept of Fusarium that represented a clade comprising all agriculturally and clinically important Fusarium species, including the F. solani species complex (FSSC). Subsequently, this concept was challenged in 2015 by one research group who proposed iding the genus Fusarium into seven genera, including the FSSC described as members of the genus Neocosmospora, with subsequent justification in 2018 based on claims that the 2013 concept of Fusarium is polyphyletic. Here, we test this claim and provide a phylogeny based on exonic nucleotide sequences of 19 orthologous protein-coding genes that strongly support the monophyly of Fusarium including the FSSC. We reassert the practical and scientific argument in support of a genus Fusarium that includes the FSSC and several other basal lineages, consistent with the longstanding use of this name among plant pathologists, medical mycologists, quarantine officials, regulatory agencies, students, and researchers with a stake in its taxonomy. In recognition of this monophyly, 40 species described as genus Neocosmospora were recombined in genus Fusarium, and nine others were renamed Fusarium. Here the global Fusarium community voices strong support for the inclusion of the FSSC in Fusarium, as it remains the best scientific, nomenclatural, and practical taxonomic option available.

Publication

Inferring and analyzing module-specific lncRNA–mRNA causal regulatory networks in human cancer

Publisher: Oxford University Press (OUP)

Date: 02-2018

DOI: 10.1093/BIB/BBY008

Abstract: It is known that noncoding RNAs (ncRNAs) cover ∼98% of the transcriptome, but do not encode proteins. Among ncRNAs, long noncoding RNAs (lncRNAs) are a large and erse class of RNA molecules, and are thought to be a gold mine of potential oncogenes, anti-oncogenes and new biomarkers. Although only a minority of lncRNAs is functionally characterized, it is clear that they are important regulators to modulate gene expression and involve in many biological functions. To reveal the functions and regulatory mechanisms of lncRNAs, it is vital to understand how lncRNAs regulate their target genes for implementing specific biological functions. In this article, we review the computational methods for inferring lncRNA–mRNA interactions and the third-party databases of storing lncRNA–mRNA regulatory relationships. We have found that the existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs, and may not reveal gene regulatory relationships which are causal relationships. Moreover, these methods do not consider the modularity of lncRNA–mRNA regulatory networks, and thus, the networks identified are not module-specific. To address the above two issues, we propose a novel method, MSLCRN, to infer and analyze module-specific lncRNA–mRNA causal regulatory networks. We have applied it into glioblastoma multiforme, lung squamous cell carcinoma, ovarian cancer and prostate cancer, respectively. The experimental results show that MSLCRN, as an expression-based method, could be a useful complementary method to study lncRNA regulations.

Publication

Spectral representation of DNA sequences and its application

Publisher: IEEE

Date: 09-2010

DOI: 10.1109/BICTA.2010.5645116

Publication

miRsponge: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules

Publisher: Cold Spring Harbor Laboratory

Date: 28-12-2018

DOI: 10.1101/507749

Abstract: A microRNA (miRNA) sponge is an RNA molecule with multiple tandem miRNA response elements that can sequester miRNAs from their target mRNAs. Despite growing appreciation of the importance of miRNA sponges, our knowledge of their complex functions remains limited. Moreover, there is still a lack of miRNA sponge research tools that help researchers to quickly compare their proposed methods with other methods, apply existing methods to new datasets, or select appropriate methods for assisting in subsequent experimental design. To fill the gap, we present an R/Bioconductor package, miRsponge , for simplifying the procedure of identifying and analyzing miRNA sponge interaction networks and modules. It provides seven popular methods and an integrative method to identify miRNA sponge interactions. Moreover, it supports the validation of miRNA sponge interactions and the identification of miRNA sponge modules, as well as functional enrichment and survival analysis of miRNA sponge modules. This package enables researchers to quickly evaluate their new methods, apply existing methods to new datasets, and consequently speed up miRNA sponge research.

Publication

Estimating heterogeneous treatment effect by balancing heterogeneity and fitness

Publisher: Springer Science and Business Media LLC

Date: 12-2018

DOI: 10.1186/S12859-018-2521-7

Publication

Guest editorial: special issue on causal discovery

Publisher: Springer Science and Business Media LLC

Date: 03-2017

DOI: 10.1007/S41060-016-0041-Y

Publication

Discovery of Causal Rules Using Partial Association

Publisher: IEEE

Date: 12-2012

DOI: 10.1109/ICDM.2012.36

Publication

LncmiRSRN: identification and analysis of long non-coding RNA related miRNA sponge regulatory network in human cancer

Publisher: Oxford University Press (OUP)

Date: 28-06-2018

DOI: 10.1093/BIOINFORMATICS/BTY525

Abstract: MicroRNAs (miRNAs) are small non-coding RNAs with the length of ∼22 nucleotides. miRNAs are involved in many biological processes including cancers. Recent studies show that long non-coding RNAs (lncRNAs) are emerging as miRNA sponges, playing important roles in cancer physiology and development. Despite accumulating appreciation of the importance of lncRNAs, the study of their complex functions is still in its preliminary stage. Based on the hypothesis of competing endogenous RNAs (ceRNAs), several computational methods have been proposed for investigating the competitive relationships between lncRNAs and miRNA target messenger RNAs (mRNAs). However, when the mRNAs are released from the control of miRNAs, it remains largely unknown as to how the sponge lncRNAs influence the expression levels of the endogenous miRNA targets. We propose a novel method to construct lncRNA related miRNA sponge regulatory networks (LncmiRSRNs) by integrating matched lncRNA and mRNA expression profiles with clinical information and putative miRNA-target interactions. Using the method, we have constructed the LncmiRSRNs for four human cancers (glioblastoma multiforme, lung cancer, ovarian cancer and prostate cancer). Based on the networks, we discover that after being released from miRNA control, the target mRNAs are normally up-regulated by the sponge lncRNAs, and only a fraction of sponge lncRNA-mRNA regulatory relationships and hub lncRNAs are shared by the four cancers. Moreover, most sponge lncRNA-mRNA regulatory relationships show a rewired mode between different cancers, and a minority of sponge lncRNA-mRNA regulatory relationships conserved (appearing) in different cancers may act as a common pivot across cancers. Besides, differential and conserved hub lncRNAs may act as potential cancer drivers to influence the cancerous state in cancers. Functional enrichment and survival analysis indicate that the identified differential and conserved LncmiRSRN network modules work as functional units in biological processes, and can distinguish metastasis risks of cancers. Our analysis demonstrates the potential of integrating expression profiles, clinical information and miRNA-target interactions for investigating lncRNA regulatory mechanism. LncmiRSRN is freely available (hangjunpeng411/LncmiRSRN). Supplementary data are available at Bioinformatics online.

Publication

Unifying Spatial, Temporal and Semantic Features for an Effective GPS Trajectory-Based Location Recommendation

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-19548-3_4

Publication

Multi-label relational classification via node and label correlation

Publisher: Elsevier BV

Date: 05-2018

DOI: 10.1016/J.NEUCOM.2018.02.079

Publication

A Data-Driven Approach to Finding K for K Nearest Neighbor Matching in Average Causal Effect Estimation

Publisher: Springer Nature Singapore

Date: 2023

DOI: 10.1007/978-981-99-7254-8_56

Publication

Detecting high-quality signals of adverse drug-drug interactions from spontaneous reporting data

Publisher: Elsevier BV

Date: 12-2020

DOI: 10.1016/J.JBI.2020.103603

Publication

Cloning for privacy protection in multiple independent data publications

Publisher: ACM

Date: 24-10-2011

DOI: 10.1145/2063576.2063705

Publication

A general framework for privacy preserving data publishing

Publisher: Elsevier BV

Date: 12-2013

DOI: 10.1016/J.KNOSYS.2013.09.022

Publication

An international survey of front-end receivers and observing performance of telescopes for radio astronomy

Publisher: IOP Publishing

Date: 02-07-2019

DOI: 10.1088/1538-3873/AB1F7E

Abstract: This paper presents a survey of microwave front-end receivers installed at radio telescopes throughout the world. This unprecedented analysis was conducted as part of a review of front-end developments for Italian radio telescopes, initiated by the Italian National Institute for Astrophysics in 2016. Fifteen international radio telescopes have been selected to be representative of the instrumentation used for radio astronomical observations in the frequency domain from 300 MHz to 116 GHz. A comprehensive description of the existing receivers is presented and their characteristics are compared and discussed. The observing performances of the complete receiving chains are also presented. An overview of ongoing developments illustrates and anticipates future trends in front-end projects to meet the most ambitious scientific research goals.

Publication

Constructing and Combining Orthogonal Projection Vectors for Ordinal Regression

Publisher: Springer Science and Business Media LLC

Date: 19-01-2014

DOI: 10.1007/S11063-014-9340-2

Publication

Disentangled Representation with Causal Constraints for Counterfactual Fairness

Publisher: Springer Nature Switzerland

Date: 2023

DOI: 10.1007/978-3-031-33374-3_37

Publication

Use of Haploid Model of Candida albicans to Uncover Mechanism of Action of a Novel Antifungal Agent

Publisher: Frontiers Media SA

Date: 08-06-2018

DOI: 10.3389/FCIMB.2018.00164

Publication

Efficient discovery of de-identification policy options through a risk-utility frontier

Publisher: ACM

Date: 18-02-2013

DOI: 10.1145/2435349.2435357

Publication

Learning Markov Blankets From Multiple Interventional Data Sets

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 06-2020

DOI: 10.1109/TNNLS.2019.2927636

Publication

(α, k)-anonymous data publishing

Publisher: Springer Science and Business Media LLC

Date: 08-01-2009

DOI: 10.1007/S10844-008-0075-2

Publication

Finding Irredundant Contained Rewritings of Tree Pattern Queries Using Views

Publisher: Springer Berlin Heidelberg

Date: 2009

DOI: 10.1007/978-3-642-00672-2_12

Publication

Identifying miRNAs, targets and functions

Publisher: Oxford University Press (OUP)

Date: 22-11-2012

DOI: 10.1093/BIB/BBS075

Publication

A Role-Based Cognitive Architecture for Multi-Agent Teaming

Publisher: Springer Berlin Heidelberg

Date: 2010

DOI: 10.1007/978-3-642-13526-2_11

Publication

Multilabel Feature Selection: A Local Causal Structure Learning Approach

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 06-2023

DOI: 10.1109/TNNLS.2021.3111288

Publication

Discovering statistically non-redundant subgroups

Publisher: Elsevier BV

Date: 09-2014

DOI: 10.1016/J.KNOSYS.2014.04.030

Publication

Feature Selection for Efficient Local-to-Global Bayesian Network Structure Learning

Publisher: Association for Computing Machinery (ACM)

Date: 19-09-2023

DOI: 10.1145/3624479

Publication

Kernel Discriminant Learning for Ordinal Regression

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 06-2010

DOI: 10.1109/TKDE.2009.170

Publication

Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy

Publisher: Springer Science and Business Media LLC

Date: 12-2009

DOI: 10.1186/1471-2105-10-408

Publication

Data privacy against composition attack

Publisher: Springer Berlin Heidelberg

Date: 2012

DOI: 10.1007/978-3-642-29038-1_24

Publication

On the Complexity of Restricted k-anonymity Problem

Publisher: Springer Berlin Heidelberg

Date: 2008

DOI: 10.1007/978-3-540-78849-2_30

Publication

ORPSW: a new classifier for gene expression data based on optimal risk and preventive patterns

Publisher: International Academy Publishing (IAP)

Date: 03-06-2011

DOI: 10.4304/JCP.6.6.1198-1205

Publication

Local Search for Efficient Causal Effect Estimation

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2023

DOI: 10.1109/TKDE.2022.3218131

Publication

Discovering Ancestral Instrumental Variables for Causal Inference From Observational Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TNNLS.2023.3262848

Publication

Development of smart data analytics tools to support wastewater treatment plant operation

Publisher: Elsevier BV

Date: 06-2018

DOI: 10.1016/J.CHEMOLAB.2018.03.006

Publication

Assessing the Causal Impact of Online Instruction due to COVID-19 on Students’ Grades and its aftermath on Grade Prediction Models

Publisher: ACM

Date: 07-09-2022

DOI: 10.1145/3524458.3547232

Publication

Comparing decision tree and optimal risk pattern mining for analysing emergency Ultra Short Stay Unit data

Publisher: IEEE

Date: 07-2008

DOI: 10.1109/ICMLC.2008.4620410

Publication

Identifying miRNA synergism using multiple-intervention causal inference

Publisher: Cold Spring Harbor Laboratory

Date: 28-05-2019

DOI: 10.1101/652180

Abstract: Studying multiple microRNAs (miRNAs) synergism in gene regulation could help to understand the regulatory mechanisms of complicated human diseases caused by miRNAs. Several existing methods have been presented to infer miRNA synergism. Most of the current methods assume that miRNAs with shared targets at the sequence level are working synergistically. However, it is unclear if miRNAs with shared targets are working in concert to regulate the targets or they in idually regulate the targets at different time points or different biological processes. A standard method to test the synergistic activities is to knock-down multiple miRNAs at the same time and measure the changes in the target genes. However, this approach may not be practical as we would have too many sets of miRNAs to test. In this paper, we present a novel framework called miRsyn for inferring miRNA synergism by using a causal inference method that mimics the multiple-intervention experiments, e.g. knocking-down multiple miRNAs, with observational data. Our results show that several miRNA-miRNA pairs that have shared targets at the sequence level are not working synergistically at the expression level. Moreover, the identified miRNA synergistic network is small-world and biologically meaningful, and a number of miRNA synergistic modules are significantly enriched in breast cancer. Our further analyses also reveal that most of synergistic miRNA-miRNA pairs show the same expression patterns. The comparison results indicate that the proposed multiple-intervention causal inference method performs better than the single-intervention causal inference method in identifying miRNA synergistic network. Taken together, the results imply that miRsyn is a promising framework for identifying miRNA synergism, and it could enhance the understanding of miRNA synergism in breast cancer.

Publication

Mining combined causes in large data sets

Publisher: Elsevier BV

Date: 2016

DOI: 10.1016/J.KNOSYS.2015.10.018

Publication

A data-driven method to detect adverse drug events from prescription data

Publisher: Elsevier BV

Date: 09-2018

DOI: 10.1016/J.JBI.2018.07.013

Abstract: Drug safety issues such as Adverse Drug Events (ADEs) can cause serious consequences for the public. The clinical trials that are undertaken to assess medicine efficacy and safety prior to marketing, generally, may provide sufficient s les for discovering common ADEs. However, more s les are needed to detect infrequent and rare events. Additionally, clinical trials may not include all subgroups of patients. For these reasons, post-marketing surveillance of medicines is necessary for identifying drug safety issues. Most regulatory agencies use the Spontaneous Reporting Systems to identify associations between medicines and suspected ADEs. Data mining with effective analytical frameworks and large-scale medical data is potentially an alternative method to discover and monitor ADEs. In the present paper, we aim to detect potential ADEs from prescription data by discovering ADE associated prescription sequences. In an ADE associated prescription sequence 〈D

Publication

Accurate data-driven prediction does not mean high reproducibility

Publisher: Springer Science and Business Media LLC

Date: 17-01-2020

DOI: 10.1038/S42256-019-0140-2

Publication

From miRNA regulation to miRNA-TF co-regulation: computational approaches and challenges

Publisher: Oxford University Press (OUP)

Date: 12-07-2014

DOI: 10.1093/BIB/BBU023

Abstract: microRNAs (miRNAs) are important gene regulators. They control a wide range of biological processes and are involved in several types of cancers. Thus, exploring miRNA functions is important for diagnostics and therapeutics. To date, there are few feasible experimental techniques for discovering miRNA regulatory mechanisms. Alternatively, predictions of miRNA-mRNA regulatory relationships by computational methods have increasingly achieved promising results. Computational approaches are proving their ability as effective tools in reducing the number of biological experiments that must be conducted and to assist with the design of the experiments. In this review, we categorize and review different computational approaches to identify miRNA activities and functions, including the co-regulation of miRNAs and transcription factors. Our main focuses are on the recent approaches that use multiple data types for exploring miRNA functions. We discuss the remaining challenges in the evaluation and selection of models based on the results from a case study. Finally, we analyse the remaining challenges of each computational approach and suggest some future research directions.

Publication

Learning Conditional Instrumental Variable Representation for Causal Effect Estimation

Publisher: Springer Nature Switzerland

Date: 2023

DOI: 10.1007/978-3-031-43412-9_31

Publication

Prediction of student actions using weighted Markov models

Publisher: IEEE

Date: 12-2008

DOI: 10.1109/ITME.2008.4743842

Publication

Exploring cell-specific miRNA regulation with single-cell miRNA-mRNA co-sequencing data

Publisher: Springer Science and Business Media LLC

Date: 02-12-2021

DOI: 10.1186/S12859-021-04498-6

Abstract: Existing computational methods for studying miRNA regulation are mostly based on bulk miRNA and mRNA expression data. However, bulk data only allows the analysis of miRNA regulation regarding a group of cells, rather than the miRNA regulation unique to in idual cells. Recent advance in single-cell miRNA-mRNA co-sequencing technology has opened a way for investigating miRNA regulation at single-cell level. However, as currently single-cell miRNA-mRNA co-sequencing data is just emerging and only available at small-scale, there is a strong need of novel methods to exploit existing single-cell data for the study of cell-specific miRNA regulation. In this work, we propose a new method, CSmiR (Cell-Specific miRNA regulation) to combine single-cell miRNA-mRNA co-sequencing data and putative miRNA-mRNA binding information to identify miRNA regulatory networks at the resolution of in idual cells. We apply CSmiR to the miRNA-mRNA co-sequencing data in 19 K562 single-cells to identify cell-specific miRNA-mRNA regulatory networks for understanding miRNA regulation in each K562 single-cell. By analyzing the obtained cell-specific miRNA-mRNA regulatory networks, we observe that the miRNA regulation in each K562 single-cell is unique. Moreover, we conduct detailed analysis on the cell-specific miRNA regulation associated with the miR-17/92 family as a case study. The comparison results indicate that CSmiR is effective in predicting cell-specific miRNA targets. Finally, through exploring cell–cell similarity matrix characterized by cell-specific miRNA regulation, CSmiR provides a novel strategy for clustering single-cells and helps to understand cell–cell crosstalk. To the best of our knowledge, CSmiR is the first method to explore miRNA regulation at a single-cell resolution level, and we believe that it can be a useful method to enhance the understanding of cell-specific miRNA regulation.

Publication

Dynamic cancer drivers: a causal approach for cancer driver discovery based on bio-pathological trajectories

Publisher: Oxford University Press (OUP)

Date: 19-09-2022

DOI: 10.1093/BFGP/ELAC030

Abstract: The traditional way for discovering genes which drive cancer (namely cancer drivers) neglects the dynamic information of cancer development, even though it is well known that cancer progresses dynamically. To enhance cancer driver discovery, we expand cancer driver concept to dynamic cancer driver as a gene driving one or more bio-pathological transitions during cancer progression. Our method refers to the fact that cancer should not be considered as a single process but a compendium of altered biological processes causing the disease to develop over time. Reciprocally, different drivers of cancer can potentially be discovered by analysing different bio-pathological pathways. We propose a novel approach for causal inference of genes driving one or more core processes during cancer development (i.e. dynamic cancer driver). We use the concept of pseudotime for inferring the latent progression of s les along a biological transition during cancer and identifying a critical event when such a process is significantly deviated from normal to carcinogenic. We infer driver genes by assessing the causal effect they have on the process after such a critical event. We have applied our method to single-cell and bulk sequencing datasets of breast cancer. The evaluation results show that our method outperforms well-recognized cancer driver inference methods. These results suggest that including information of the underlying dynamics of cancer improves the inference process (in comparison with using static data), and allows us to discover different sets of driver genes from different processes in cancer. R scripts and datasets can be found at github.com/AndresMCB/DynamicCancerDriver

Publication

Causal Feature Selection With Dual Correction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TNNLS.2022.3178075

Publication

Causal learner: A toolbox for causal structure and Markov blanket learning

Publisher: Elsevier BV

Date: 11-2022

DOI: 10.1016/J.PATREC.2022.09.021

Publication

A two-layer multi-dimensional trustworthiness metric for web service composition

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-37401-2_17

Publication

Multi-Source Causal Feature Selection

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2020

DOI: 10.1109/TPAMI.2019.2908373

Publication

Assessment of network module identification across complex diseases

Publisher: Springer Science and Business Media LLC

Date: 30-08-2019

DOI: 10.1038/S41592-019-0509-5

Publication

An improvement of symbolic aggregate approximation distance measure for time series

Publisher: Elsevier BV

Date: 08-2014

DOI: 10.1016/J.NEUCOM.2014.01.045

Publication

Identifying functional miRNA–mRNA regulatory modules with correspondence latent dirichlet allocation

Publisher: Oxford University Press (OUP)

Date: 17-10-2010

DOI: 10.1093/BIOINFORMATICS/BTQ576

Abstract: Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that cause mRNA degradation and translational inhibition. They are important regulators of development and cellular homeostasis through their control of erse processes. Recently, great efforts have been made to elucidate their regulatory mechanism, but the functions of most miRNAs and their precise regulatory mechanisms remain elusive. With more and more matched expression profiles of miRNAs and mRNAs having been made available, it is of great interest to utilize both expression profiles to discover the functional regulatory networks of miRNAs and their target mRNAs for potential biological processes that they may participate in. Results: We present a probabilistic graphical model to discover functional miRNA regulatory modules at potential biological levels by integrating heterogeneous datasets, including expression profiles of miRNAs and mRNAs, with or without the prior target binding information. We applied this model to a mouse mammary dataset. It effectively captured several biological process specific modules involving miRNAs and their target mRNAs. Furthermore, without using prior target binding information, the identified miRNAs and mRNAs in each module show a large proportion of overlap with predicted miRNA target relationships, suggesting that expression profiles are crucial for both target identification and discovery of regulatory modules. Contact: bing.liu@unisa.edu.au jiuyong.li@unisa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Distributed Anonymization for Multiple Data Providers in a Cloud System

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-37487-6_27

Publication

Feature Fusion Using Locally Linear Embedding for Classification

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2010

DOI: 10.1109/TNN.2009.2036363

Publication

A Robust Ensemble Classification Method Analysis

Publisher: Springer New York

Date: 2010

DOI: 10.1007/978-1-4419-5913-3_17

Abstract: Apart from the dimensionality problem, the uncertainty of Microarray data quality is another major challenge of Microarray classification. Microarray data contain various levels of noise and quite often high levels of noise, and these data lead to unreliable and low accuracy analysis as well as high dimensionality problem. In this paper, we propose a new Microarray data classification method, based on ersified multiple trees. The new method contains features that (1) make most use of the information from the abundant genes in the Microarray data and (2) use a unique ersity measurement in the ensemble decision committee. The experimental results show that the proposed classification method (DMDT) and the well-known method (CS4), which ersifies trees by using distinct tree roots, are more accurate on average than other well-known ensemble methods, including Bagging, Boosting, and Random Forests. The experiments also indicate that using ersity measurement of DMDT improves the classification accuracy of ensemble classification on Microarray data.

Publication

The High Time Resolution Universe survey – XIV. Discovery of 23 pulsars through GPU-accelerated reprocessing

Publisher: Oxford University Press (OUP)

Date: 07-12-2018

DOI: 10.1093/MNRAS/STY3328

Publication

Efficient Mining of Non-derivable Emerging Patterns

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-19548-3_20

Publication

On the Effectiveness of Gene Selection for Microarray Classification Methods

Publisher: Springer Berlin Heidelberg

Date: 2010

DOI: 10.1007/978-3-642-12101-2_31

Publication

A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2019

DOI: 10.1109/TCBB.2016.2591526

Publication

Inferring microRNA–mRNA causal regulatory relationships from expression data

Publisher: Oxford University Press (OUP)

Date: 30-01-2013

DOI: 10.1093/BIOINFORMATICS/BTT048

Abstract: Motivation: microRNAs (miRNAs) are known to play an essential role in the post-transcriptional gene regulation in plants and animals. Currently, several computational approaches have been developed with a shared aim to elucidate miRNA–mRNA regulatory relationships. Although these existing computational methods discover the statistical relationships, such as correlations and associations between miRNAs and mRNAs at data level, such statistical relationships are not necessarily the real causal regulatory relationships that would ultimately provide useful insights into the causes of gene regulations. The standard method for determining causal relationships is randomized controlled perturbation experiments. In practice, however, such experiments are expensive and time consuming. Our motivation for this study is to discover the miRNA–mRNA causal regulatory relationships from observational data. Results: We present a causality discovery-based method to uncover the causal regulatory relationship between miRNAs and mRNAs, using expression profiles of miRNAs and mRNAs without taking into consideration the previous target information. We apply this method to the epithelial-to-mesenchymal transition (EMT) datasets and validate the computational discoveries by a controlled biological experiment for the miR-200 family. A significant portion of the regulatory relationships discovered in data is consistent with those identified by experiments. In addition, the top genes that are causally regulated by miRNAs are highly relevant to the biological conditions of the datasets. The results indicate that the causal discovery method effectively discovers miRNA regulatory relationships in data. Although computational predictions may not completely replace intervention experiments, the accurate and reliable discoveries in data are cost effective for the design of miRNA experiments and the understanding of miRNA–mRNA regulatory relationships. Availability: The R scripts are in the Supplementary material. Contact: thuc_duy.le@mymail.unisa.edu.au or jiuyong.li@unisa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Using causal discovery for feature selection in multivariate numerical time series

Publisher: Springer Science and Business Media LLC

Date: 09-07-2014

DOI: 10.1007/S10994-014-5460-1

Publication

DriverGroup: A novel method for identifying driver gene groups

Publisher: Oxford University Press (OUP)

Date: 12-2020

DOI: 10.1093/BIOINFORMATICS/BTAA797

Abstract: Identifying cancer driver genes is a key task in cancer informatics. Most existing methods are focused on in idual cancer drivers which regulate biological processes leading to cancer. However, the effect of a single gene may not be sufficient to drive cancer progression. Here, we hypothesize that there are driver gene groups that work in concert to regulate cancer, and we develop a novel computational method to detect those driver gene groups. We develop a novel method named DriverGroup to detect driver gene groups by using gene expression and gene interaction data. The proposed method has three stages: (i) constructing the gene network, (ii) discovering critical nodes of the constructed network and (iii) identifying driver gene groups based on the discovered critical nodes. Before evaluating the performance of DriverGroup in detecting cancer driver groups, we firstly assess its performance in detecting the influence of gene groups, a key step of DriverGroup. The application of DriverGroup to DREAM4 data demonstrates that it is more effective than other methods in detecting the regulation of gene groups. We then apply DriverGroup to the BRCA dataset to identify driver groups for breast cancer. The identified driver groups are promising as several group members are confirmed to be related to cancer in literature. We further use the predicted driver groups in survival analysis and the results show that the survival curves of patient subpopulations classified using the predicted driver groups are significantly differentiated, indicating the usefulness of DriverGroup. DriverGroup is available at vvhoang/DriverGroup Supplementary data are available at Bioinformatics online.

Publication

R-U policy frontiers for health data de-identification

Publisher: Oxford University Press (OUP)

Date: 23-04-2015

DOI: 10.1093/JAMIA/OCV004

Abstract: Objective The Health Insurance Portability and Accountability Act Privacy Rule enables healthcare organizations to share de-identified data via two routes. They can either 1) show re-identification risk is small (e.g., via a formal model, such as k-anonymity) with respect to an anticipated recipient or 2) apply a rule-based policy (i.e., Safe Harbor) that enumerates attributes to be altered (e.g., dates to years). The latter is often invoked because it is interpretable, but it fails to tailor protections to the capabilities of the recipient. The paper shows rule-based policies can be mapped to a utility (U) and re-identification risk (R) space, which can be searched for a collection, or frontier, of policies that systematically trade off between these goals. Methods We extend an algorithm to efficiently compose an R-U frontier using a lattice of policy options. Risk is proportional to the number of patients to which a record corresponds, while utility is proportional to similarity of the original and de-identified distribution. We allow our method to search 20 000 rule-based policies (out of 2700) and compare the resulting frontier with k-anonymous solutions and Safe Harbor using the demographics of 10 U.S. states. Results The results demonstrate the rule-based frontier 1) consists, on average, of 5000 policies, 2% of which enable better utility with less risk than Safe Harbor and 2) the policies cover a broader spectrum of utility and risk than k-anonymity frontiers. Conclusions R-U frontiers of de-identification policies can be discovered efficiently, allowing healthcare organizations to tailor protections to anticipated needs and trustworthiness of recipients.

Publication

Satisfying Privacy Requirements Before Data Anonymization

Publisher: Oxford University Press (OUP)

Date: 17-03-2011

DOI: 10.1093/COMJNL/BXR028

Publication

Discrimination detection by causal effect estimation

Publisher: IEEE

Date: 12-2017

DOI: 10.1109/BIGDATA.2017.8258033

Publication

ParallelPC: An R Package for Efficient Causal Exploration in Genomic Data

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-04503-6_22

Publication

Inferring condition-specific miRNA activity from matched miRNA and mRNA expression data

Publisher: Oxford University Press (OUP)

Date: 23-07-2014

DOI: 10.1093/BIOINFORMATICS/BTU489

Abstract: Motivation: MicroRNAs (miRNAs) play crucial roles in complex cellular networks by binding to the messenger RNAs (mRNAs) of protein coding genes. It has been found that miRNA regulation is often condition-specific. A number of computational approaches have been developed to identify miRNA activity specific to a condition of interest using gene expression data. However, most of the methods only use the data in a single condition, and thus, the activity discovered may not be unique to the condition of interest. Additionally, these methods are based on statistical associations between the gene expression levels of miRNAs and mRNAs, so they may not be able to reveal real gene regulatory relationships, which are causal relationships. Results: We propose a novel method to infer condition-specific miRNA activity by considering (i) the difference between the regulatory behavior that an miRNA has in the condition of interest and its behavior in the other conditions (ii) the causal semantics of miRNA–mRNA relationships. The method is applied to the epithelial–mesenchymal transition (EMT) and multi-class cancer (MCC) datasets. The validation by the results of transfection experiments shows that our approach is effective in discovering significant miRNA–mRNA interactions. Functional and pathway analysis and literature validation indicate that the identified active miRNAs are closely associated with the specific biological processes, diseases and pathways. More detailed analysis of the activity of the active miRNAs implies that some active miRNAs show different regulation types in different conditions, but some have the same regulation types and their activity only differs in different conditions in the strengths of regulation. Availability and implementation: The R and Matlab scripts are in the Supplementary materials . Contact: jiuyong.li@unisa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

PSL: An Algorithm for Partial Bayesian Network Structure Learning

Publisher: Association for Computing Machinery (ACM)

Date: 09-03-2022

DOI: 10.1145/3508071

Abstract: Learning partial Bayesian network (BN) structure is an interesting and challenging problem. In this challenge, it is computationally expensive to use global BN structure learning algorithms, while only one part of a BN structure is interesting, local BN structure learning algorithms are not a favourable solution either due to the issue of false edge orientation. To address the problem, this article first presents a detailed analysis of the false edge orientation issue with local BN structure learning algorithms and then proposes PSL, an efficient and accurate P artial BN S tructure L earning (PSL) algorithm. Specifically, PSL ides V-structures in a Markov blanket (MB) into two types: Type-C V-structures and Type-NC V-structures, then it starts from the given node of interest and recursively finds both types of V-structures in the MB of the current node until all edges in the partial BN structure are oriented. To further improve the efficiency of PSL, the PSL-FS algorithm is designed by incorporating F eature S election (FS) into PSL. Extensive experiments with six benchmark BNs validate the efficiency and accuracy of the proposed algorithms.

Publication

Identifying microRNA targets in epithelial-mesenchymal transition using joint-intervention causal inference

Publisher: ACM

Date: 07-12-2017

DOI: 10.1145/3156346.3156353

Publication

A simple yet effective data integration approach to tree-based microarray data classification

Publisher: IEEE

Date: 08-2010

DOI: 10.1109/IEMBS.2010.5626842

Publication

Top-k Similarity Matching in Large Graphs with Attributes

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-05813-9_11

Publication

Publishing anonymous survey rating data

Publisher: Springer Science and Business Media LLC

Date: 26-11-2010

DOI: 10.1007/S10618-010-0208-4

Publication

Identifying Cancer Subtypes from miRNA-TF-mRNA Regulatory Networks and Expression Data

Publisher: Public Library of Science (PLoS)

Date: 04-2016

DOI: 10.1371/JOURNAL.PONE.0152792

Publication

Identifying preeclampsia-associated genes using a control theory method

Publisher: Oxford University Press (OUP)

Date: 28-04-2022

DOI: 10.1093/BFGP/ELAC006

Abstract: Preecl sia is a pregnancy-specific disease that can have serious effects on the health of both mothers and their offspring. Predicting which women will develop preecl sia in early pregnancy with high accuracy will allow for improved management. The clinical symptoms of preecl sia are well recognized, however, the precise molecular mechanisms leading to the disorder are poorly understood. This is compounded by the heterogeneous nature of preecl sia onset, timing and severity. Indeed a multitude of poorly defined causes including genetic components implicates etiologic factors, such as immune maladaptation, placental ischemia and increased oxidative stress. Large datasets generated by microarray and next-generation sequencing have enabled the comprehensive study of preecl sia at the molecular level. However, computational approaches to simultaneously analyze the preecl sia transcriptomic and network data and identify clinically relevant information are currently limited. In this paper, we proposed a control theory method to identify potential preecl sia-associated genes based on both transcriptomic and network data. First, we built a preecl sia gene regulatory network and analyzed its controllability. We then defined two types of critical preecl sia-associated genes that play important roles in the constructed preecl sia-specific network. Benchmarking against differential expression, betweenness centrality and hub analysis we demonstrated that the proposed method may offer novel insights compared with other standard approaches. Next, we investigated subtype specific genes for early and late onset preecl sia. This control theory approach could contribute to a further understanding of the molecular mechanisms contributing to preecl sia.

Publication

Access Time Oracle for Planar Graphs

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2016

DOI: 10.1109/TKDE.2016.2547382

Publication

Recommending Personalized Interventions to Increase Employability of Disabled Jobseekers

Publisher: Springer International Publishing

Date: 2022

DOI: 10.1007/978-3-031-05981-0_8

Publication

Mining Causal Association Rules

Publisher: IEEE

Date: 12-2013

DOI: 10.1109/ICDMW.2013.88

Publication

L-Diversity Based Dynamic Update for Large Time-Evolving Microdata

Publisher: Springer Berlin Heidelberg

Date: 2008

DOI: 10.1007/978-3-540-89378-3_47

Publication

Spectral Representation of Protein Sequences

Publisher: American Scientific Publishers

Date: 07-2011

DOI: 10.1166/JCTN.2011.1819

Publication

Identifying miRNA sponge modules using biclustering and regulatory scores

Publisher: Springer Science and Business Media LLC

Date: 03-2017

DOI: 10.1186/S12859-017-1467-5

Publication

A Unified View of Causal and Non-causal Feature Selection

Publisher: Association for Computing Machinery (ACM)

Date: 18-04-2021

DOI: 10.1145/3436891

Abstract: In this article, we aim to develop a unified view of causal and non-causal feature selection methods. The unified view will fill in the gap in the research of the relation between the two types of methods. Based on the Bayesian network framework and information theory, we first show that causal and non-causal feature selection methods share the same objective. That is to find the Markov blanket of a class attribute, the theoretically optimal feature set for classification. We then examine the assumptions made by causal and non-causal feature selection methods when searching for the optimal feature set, and unify the assumptions by mapping them to the restrictions on the structure of the Bayesian network model of the studied problem. We further analyze in detail how the structural assumptions lead to the different levels of approximations employed by the methods in their search, which then result in the approximations in the feature sets found by the methods with respect to the optimal feature set. With the unified view, we can interpret the output of non-causal methods from a causal perspective and derive the error bounds of both types of methods. Finally, we present practical understanding of the relation between causal and non-causal methods using extensive experiments with synthetic data and various types of real-world data.

Publication

How do the existing fairness metrics and unfairness mitigation algorithms contribute to ethical learning analytics?

Publisher: Wiley

Date: 12-04-2022

DOI: 10.1111/BJET.13217

Abstract: With the widespread use of learning analytics (LA), ethical concerns about fairness have been raised. Research shows that LA models may be biased against students of certain demographic subgroups. Although fairness has gained significant attention in the broader machine learning (ML) community in the last decade, it is only recently that attention has been paid to fairness in LA. Furthermore, the decision on which unfairness mitigation algorithm or metric to use in a particular context remains largely unknown. On this premise, we performed a comparative evaluation of some selected unfairness mitigation algorithms regarded in the fair ML community to have shown promising results. Using a 3‐year program dropout data from an Australian university, we comparatively evaluated how the unfairness mitigation algorithms contribute to ethical LA by testing for some hypotheses across fairness and performance metrics. Interestingly, our results show how data bias does not always necessarily result in predictive bias. Perhaps not surprisingly, our test for fairness‐utility tradeoff shows how ensuring fairness does not always lead to drop in utility. Indeed, our results show that ensuring fairness might lead to enhanced utility under specific circumstances. Our findings may to some extent, guide fairness algorithm and metric selection for a given context. What is already known about this topic LA is increasingly being used to leverage actionable insights about students and drive student success. LA models have been found to make discriminatory decisions against certain student demographic subgroups—therefore, raising ethical concerns. Fairness in education is nascent. Only a few works have examined fairness in LA and consequently followed up with ensuring fair LA models. What this paper adds A juxtaposition of unfairness mitigation algorithms across the entire LA pipeline showing how they compare and how each of them contributes to fair LA. Ensuring ethical LA does not always lead to a dip in performance. Sometimes, it actually improves performance as well. Fairness in LA has only focused on some form of outcome equality, however equality of outcome may be possible only when the playing field is levelled. Implications for practice and/or policy Based on desired notion of fairness and which segment of the LA pipeline is accessible, a fairness‐minded decision maker may be able to decide which algorithm to use in order to achieve their ethical goals. LA practitioners can carefully aim for more ethical LA models without trading significant utility by selecting algorithms that find the right balance between the two objectives. Fairness enhancing technologies should be cautiously used as guides—not final decision makers. Human domain experts must be kept in the loop to handle the dynamics of transcending fair LA beyond equality to equitable LA.

Publication

SensorTree: Bursty Propagation Trees as Sensors for Protest Event Detection

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-030-02922-7_19

Publication

Give rookies a chance: A trust-based institutional online supplier recommendation framework

Publisher: Springer Berlin Heidelberg

Date: 2012

DOI: 10.1007/978-3-642-30436-1_33

Publication

Causality-based Feature Selection

Publisher: Association for Computing Machinery (ACM)

Date: 28-09-2020

DOI: 10.1145/3409382

Abstract: Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this article, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at uiy/CausalFS). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world datasets. Finally, we discuss some challenging problems to be tackled in future research.

Publication

Intervention Recommendation for Improving Disability Employment

Publisher: IEEE

Date: 10-12-2020

DOI: 10.1109/BIGDATA50022.2020.9378350

Publication

Collective behavior learning by differentiating personal preference from peer influence

Publisher: Elsevier BV

Date: 11-2018

DOI: 10.1016/J.KNOSYS.2018.06.027

Publication

An Effective Spatio-Temporal Approach for Predicting Future Semantic Locations

Publisher: Springer International Publishing

Date: 2016

DOI: 10.1007/978-3-319-46922-5_22

Publication

Introduction to the Special Section on Advances in Causal Discovery and Inference

Publisher: Association for Computing Machinery (ACM)

Date: 30-09-2019

DOI: 10.1145/3359995

Publication

Disentangled Representation for Causal Mediation Analysis

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Date: 26-06-2023

DOI: 10.1609/AAAI.V37I9.26266

Abstract: Estimating direct and indirect causal effects from observational data is crucial to understanding the causal mechanisms and predicting the behaviour under different interventions. Causal mediation analysis is a method that is often used to reveal direct and indirect effects. Deep learning shows promise in mediation analysis, but the current methods only assume latent confounders that affect treatment, mediator and outcome simultaneously, and fail to identify different types of latent confounders (e.g., confounders that only affect the mediator or outcome). Furthermore, current methods are based on the sequential ignorability assumption, which is not feasible for dealing with multiple types of latent confounders. This work aims to circumvent the sequential ignorability assumption and applies the piecemeal deconfounding assumption as an alternative. We propose the Disentangled Mediation Analysis Variational AutoEncoder (DMAVAE), which disentangles the representations of latent confounders into three types to accurately estimate the natural direct effect, natural indirect effect and total effect. Experimental results show that the proposed method outperforms existing methods and has strong generalisation ability. We further apply the method to a real-world dataset to show its potential application.

Publication

Logics for Representing Data Mining Tasks in Inductive Databases

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-08608-8_20

Publication

Combined Feature Selection and Cancer Prognosis Using Support Vector Machine Regression

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2011

DOI: 10.1109/TCBB.2010.119

Publication

The SUrvey for Pulsars and Extragalactic Radio Bursts – III. Polarization properties of FRBs 160102 and 151230

Publisher: Oxford University Press (OUP)

Date: 03-05-2018

DOI: 10.1093/MNRAS/STY1137

Publication

Mining Markov Blankets Without Causal Sufficiency

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2018

DOI: 10.1109/TNNLS.2018.2828982

Publication

Detecting signals of detrimental prescribing cascades from social media

Publisher: Elsevier BV

Date: 07-2016

DOI: 10.1016/J.ARTMED.2016.06.002

Abstract: Prescribing cascade (PC) occurs when an adverse drug reaction (ADR) is misinterpreted as a new medical condition, leading to further prescriptions for treatment. Additional prescriptions, however, may worsen the existing condition or introduce additional adverse effects (AEs). Timely detection and prevention of detrimental PCs is essential as drug AEs are among the leading causes of hospitalization and deaths. Identifying detrimental PCs would enable warnings and contraindications to be disseminated and assist the detection of unknown drug AEs. Nonetheless, the detection is difficult and has been limited to case reports or case assessment using administrative health claims data. Social media is a promising source for detecting signals of detrimental PCs due to the public availability of many discussions regarding treatments and drug AEs. In this paper, we investigate the feasibility of detecting detrimental PCs from social media. The detection, however, is challenging due to the data uncertainty and data rarity in social media. We propose a framework to mine sequences of drugs and AEs that signal detrimental PCs, taking into account the data uncertainty and data rarity. We conduct experiments on two real-world datasets collected from Twitter and Patient health forum. Our framework achieves encouraging results in the validation against known detrimental PCs (F1=78% for Twitter and 68% for Patient) and the detection of unknown potential detrimental PCs (Precision@50=72% and NDCG@50=95% for Twitter, Precision@50=86% and NDCG@50=98% for Patient). In addition, the framework is efficient and scalable to large datasets. Our study demonstrates the feasibility of generating hypotheses of detrimental PCs from social media to reduce pharmacists' guesswork.

Publication

CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization

Publisher: Oxford University Press (OUP)

Date: 12-06-2017

DOI: 10.1093/BIOINFORMATICS/BTX378

Abstract: Identifying molecular cancer subtypes from multi-omics data is an important step in the personalized medicine. We introduce CancerSubtypes, an R package for identifying cancer subtypes using multi-omics data, including gene expression, miRNA expression and DNA methylation data. CancerSubtypes integrates four main computational methods which are highly cited for cancer subtype identification and provides a standardized framework for data pre-processing, feature selection, and result follow-up analyses, including results computing, biology validation and visualization. The input and output of each step in the framework are packaged in the same data format, making it convenience to compare different methods. The package is useful for inferring cancer subtypes from an input genomic dataset, comparing the predictions from different well-known methods and testing new subtype discovery methods, as shown with different application scenarios in the Supplementary Material. The package is implemented in R and available under GPL-2 license from the Bioconductor website (ackages/CancerSubtypes/). Supplementary data are available at Bioinformatics online.

Publication

Injecting purpose and trust into data anonymisation

Publisher: Elsevier BV

Date: 07-2011

DOI: 10.1016/J.COSE.2011.05.005

Publication

Methods to Mitigate Risk of Composition Attack in Independent Data Publications

Publisher: Springer International Publishing

Date: 2015

DOI: 10.1007/978-3-319-23633-9_8

Publication

Multi-Label Feature Selection Via Adaptive Label Correlation Estimation

Publisher: Association for Computing Machinery (ACM)

Date: 10-08-2023

DOI: 10.1145/3604560

Abstract: In multi-label learning, each instance is associated with multiple labels simultaneously. Multi-label data often have noisy, irrelevant, and redundant features of high dimensionality. Multi-label feature selection has received considerable attention as an effective means for dealing with high-dimensional multi-label data. Many multi-label feature selection methods exploit label correlations to help select features. However, finding label correlations and selecting features in existing multi-label feature selection methods are often two separate processes, the existence of noises and outliers in training data makes the label correlations exploited from label space less reliable. Therefore, the learned label correlations may mislead the feature selection process and result in the selection of less informative features. This article proposes a novel algorithm named ROAD, i.e., multi-label featuRe selectiOn via ADaptive label correlation estimation. ROAD jointly performs adaptive label correlation exploration and feature selection with alternating optimization to obtain reliable estimation of label correlations, which can more effectively reveal the intrinsic manifold structure among labels and lead to the selection of a more proper feature subset. Comprehensive experiments on several frequently used datasets validate the superiority of ROAD against the state-of-the-art multi-label feature selection algorithms.

Publication

(α, k)-anonymity

Publisher: ACM

Date: 20-08-2006

DOI: 10.1145/1150402.1150499

Publication

Causal Decision Trees

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 02-2017

DOI: 10.1109/TKDE.2016.2619350

Publication

Mining heterogeneous causal effects for personalized cancer treatment

Publisher: Oxford University Press (OUP)

Date: 24-03-2017

DOI: 10.1093/BIOINFORMATICS/BTX174

Abstract: Cancer is not a single disease and involves different subtypes characterized by different sets of molecules. Patients with different subtypes of cancer often react heterogeneously towards the same treatment. Currently, clinical diagnoses rather than molecular profiles are used to determine the most suitable treatment. A molecular level approach will allow a more precise and informed way for making treatment decisions, leading to a better survival chance and less suffering of patients. Although many computational methods have been proposed to identify cancer subtypes at molecular level, to the best of our knowledge none of them are designed to discover subtypes with heterogeneous treatment responses. In this article we propose the Survival Causal Tree (SCT) method. SCT is designed to discover patient subgroups with heterogeneous treatment effects from censored observational data. Results on TCGA breast invasive carcinoma and glioma datasets have shown that for each subtype identified by SCT, the patients treated with radiotherapy exhibit significantly different relapse free survival pattern when compared to patients without the treatment. With the capability to identify cancer subtypes with heterogeneous treatment responses, SCT is useful in helping to choose the most suitable treatment for in idual patients. Data and code are available at github.com/WeijiaZhang24/SurvivalCausalTree. Supplementary data are available at Bioinformatics online.

Publication

Assessing the Fairness of Course Success Prediction Models in the Face of (Un)equal Demographic Group Distribution

Publisher: ACM

Date: 20-07-2023

DOI: 10.1145/3573051.3593381

Publication

A hybrid approach to prevent composition attacks for independent data releases

Publisher: Elsevier BV

Date: 11-2016

DOI: 10.1016/J.INS.2016.05.009

Publication

Constraining and Summarizing Optimal Risk and Preventive Patterns in Medical Data

Publisher: IEEE

Date: 06-2010

DOI: 10.1109/ICBBE.2010.5514758

Publication

miRspongeR: an R/Bioconductor package for the identification and analysis of miRNA sponge interaction networks and modules

Publisher: Springer Science and Business Media LLC

Date: 10-05-2019

DOI: 10.1186/S12859-019-2861-Y

Publication

Identifying miRNA synergistic regulatory networks in heterogeneous human data via network motifs

Publisher: Royal Society of Chemistry (RSC)

Date: 2016

DOI: 10.1039/C5MB00562K

Abstract: We present a causality based framework called mirSRN to infer miRNA synergism in human molecular systems.

Publication

Manipulating Visibility of Political and Apolitical Threads on Reddit via Score Boosting

Publisher: IEEE

Date: 08-2018

DOI: 10.1109/TRUSTCOM/BIGDATASE.2018.00037

Publication

Solar energy harvesting clear glass for building-integrated photovoltaics

Publisher: IEEE

Date: 12-2014

DOI: 10.1109/HONET.2014.7029393

Publication

The KDD 2021 Workshop on Causal Discovery (CD2021)

Publisher: ACM

Date: 14-08-2021

DOI: 10.1145/3447548.3469462

Publication

Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction

Publisher: Cold Spring Harbor Laboratory

Date: 06-06-2018

DOI: 10.1101/340638

Abstract: microRNAs (miRNAs) regulate gene expression at the post-transcriptional level and they play an important role in various biological processes in the human body. Therefore, identifying their regulation mechanisms is essential for the diagnostics and therapeutics for a wide range of diseases. There have been a large number of researches which use gene expression profiles to resolve this problem. However, the current methods have their own limitations. Some of them only identify the correlation of miRNA and mRNA expression levels instead of the causal or regulatory relationships while others infer the causality but with a high computational complexity. To overcome these issues, in this study, we propose a method to identify miRNA-mRNA regulatory relationships in breast cancer using the invariant causal prediction. The key idea of invariant causal prediction is that the cause miRNAs of their target mRNAs are the ones which have persistent causal relationships with the target mRNAs across different environments. In this research, we aim to find miRNA targets which are consistent across different breast cancer subtypes. Thus, first of all, we apply the Pam50 method to categorise BRCA s les into different ‘‘environment” groups based on different cancer subtypes. Then we use the invariant causal prediction method to find miRNA-mRNA regulatory relationships across subtypes. We validate the results with the miRNA-transfected experimental data and the results show that our method outperforms the state-of-the-art methods. In addition, we also integrate this new method with the Pearson correlation analysis method and Lasso in an ensemble method to take the advantages of these methods. We then validate the results of the ensemble method with the experimentally confirmed data and the ensemble method shows the best performance, even comparing to the proposed causal method. Functional enrichment analyses show that miRNAs in the regulatory relationship predicated by the proposed causal method tend to synergistically regulate target genes, indicating the usefulness of these methods, and the identified miRNA targets could be used in the design of wet-lab experiments to discover the causes of breast cancer. Cancer is a disease of cells in human body and it causes a high rate of deaths world wide. There has been evidence that non-coding RNAs are key players in the development and progression of cancer. Among the different types of non-coding RNAs, miRNAs, which are short non-coding RNAs, regulate gene expression and play an important role in different biological processes as well as various cancer types. To design better diagnostic and therapeutic plans for cancer patients, we need to know the roles of miRNAs in cancer initialisation and development, and their regulation mechanisms in the human body. In this study, we propose algorithms to identify miRNA-mRNA regulatory relationships in breast cancer. Comparing our methods with existing methods in predicting miRNA targets, our methods show a better performance. The estimated miRNA targets from our methods could be a potential source for further wet-lab experiments to discover the causes of breast cancer.

Publication

DATM: A Novel Data Agnostic Topic Modeling Technique With Improved Effectiveness for Both Short and Long Text

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/ACCESS.2023.3262653

Publication

A Unified Survey of Treatment Effect Heterogeneity Modelling and Uplift Modelling

Publisher: Association for Computing Machinery (ACM)

Date: 04-10-2021

DOI: 10.1145/3466818

Abstract: A central question in many fields of scientific research is to determine how an outcome is affected by an action, i.e., to estimate the causal effect or treatment effect of an action. In recent years, in areas such as personalised healthcare, sociology, and online marketing, a need has emerged to estimate heterogeneous treatment effects with respect to in iduals of different characteristics. To meet this need, two major approaches have been taken: treatment effect heterogeneity modelling and uplifting modelling. Researchers and practitioners in different communities have developed algorithms based on these approaches to estimate the heterogeneous treatment effects. In this article, we present a unified view of these two seemingly disconnected yet closely related approaches under the potential outcome framework. We provide a structured survey of existing methods following either of the two approaches, emphasising their inherent connections and using unified notation to facilitate comparisons. We also review the main applications of the surveyed methods in personalised marketing, personalised medicine, and sociology. Finally, we summarise and discuss the available software packages and source codes in terms of their coverage of different methods and applicability to different datasets, and we provide general guidelines for method selection.

Publication

(p<sup>+</sup>, α)-sensitive k-anonymity: A new enhanced privacy protection model

Publisher: IEEE

Date: 07-2008

DOI: 10.1109/CIT.2008.4594650

Publication

Computational methods for cancer driver discovery: A survey

Publisher: Ivyspring International Publisher

Date: 2021

DOI: 10.7150/THNO.52670

Publication

Preface to the First IEEE ICDM Workshop on Causal Discovery

Publisher: IEEE

Date: 12-2013

DOI: 10.1109/ICDMW.2013.7

Publication

Identifying key factors of student academic performance by subgroup discovery

Publisher: Springer Science and Business Media LLC

Date: 21-06-2018

DOI: 10.1007/S41060-018-0141-Y

Publication

A probabilistic approach to mitigate composition attacks on privacy in non-coordinated environments

Publisher: Elsevier BV

Date: 09-2014

DOI: 10.1016/J.KNOSYS.2014.04.019

Publication

Mining risk patterns in medical data

Publisher: ACM

Date: 21-08-2005

DOI: 10.1145/1081870.1081971

Publication

Predicting academic performance by considering student heterogeneity

Publisher: Elsevier BV

Date: 12-2018

DOI: 10.1016/J.KNOSYS.2018.07.042

Publication

Building fair predictive models

Publisher: Springer International Publishing

Date: 2020

DOI: 10.1007/978-3-030-64984-5_17

Publication

Assessing Classifier Fairness with Collider Bias

Publisher: Springer International Publishing

Date: 2022

DOI: 10.1007/978-3-031-05936-0_21

Publication

Preface to the ACM TIST Special Issue on Causal Discovery and Inference

Publisher: Association for Computing Machinery (ACM)

Date: 09-01-2016

DOI: 10.1145/2840720

Publication

Exploring Groups from Heterogeneous Data via Sparse Learning

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-37453-1_46

Publication

A Study on the Applications of Emerging Sequential Patterns

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-08608-8_6

Jiuyong Li

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Privacy preserving serial publication of transactional data

A Study of the Single Point Mutation Loci in the Hepatitis B Virus Sequences via Optimal Risk and Preventive Sets with Weights

What is the Most Effective Intervention to Increase Job Retention for this Disabled Worker?

Identifying miRNA synergism using multiple-intervention causal inference

Ancestral Instrument Method for Causal Inference without Complete Knowledge

Efficient Outlier Detection for High-Dimensional Data

miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships

Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data

Spin-down evolution and radio disappearance of the magnetar PSR J1622-4950

Guest Editorial: Special Issue on Causal Discovery 2017

Authenticity and credibility aware detection of adverse drug events from social media

Measurement Invariance of the Self-Description Questionnaire II in a Chinese Sample

Data-Driven Causal Effect Estimation Based on Graphical Causal Modelling: A Survey

Fairmod: making predictions fair in multiple protected attributes

Medical Applications of Artificial Intelligence

Personalized Interventions to Increase the Employment Success of People With Disability

Uncovering the roles of microRNAs/lncRNAs in characterising breast cancer subtypes and prognosis

A framework for reputation bootstrapping based on reputation utility and game theories

An approximate microaggregation approach for microdata protection

PSR J2322−2650 – a low-luminosity millisecond pulsar with a planetary-mass companion

Phytophthora Database 2.0: Update and future direction

Authenticity and credibility aware detection of adverse drug events from social media

Portable devices of security and privacy preservation for e-learning

Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction

STMM: Semantic and Temporal-Aware Markov Chain Model for Mobility Prediction

Adaptive Skeleton Construction for Accurate DAG Learning

CBNA: A control theory based method for identifying coding and non-coding cancer drivers

Computational methods for identifying miRNA sponge interactions

Achieving P-Sensitive K-Anonymity via Anatomy

pDriver : A novel method for unravelling personalised coding and miRNA cancer drivers.

Discovering Functional microRNA-mRNA Regulatory Modules in Heterogeneous Data

Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

A general framework for causal classification

Construct robust rule sets for classification

Learning Causal Representations for Robust Domain Adaptation

Effective Pruning for the Discovery of Conditional Functional Dependencies

Nonparametric Sparse Matrix Decomposition for Cross-View Dimensionality Reduction

FastOPM—A practical method for partial match of time series

Time to infer miRNA sponge modules

A sub-national economic complexity analysis of Australia’s states and territories

Motif Discovery and Phylogenetic Analysis of Hepatitis B Virus Sequences

miRBaseConverter: an R/Bioconductor package for converting and retrieving miRNA name, accession, sequence and family information in different versions of miRBase

Should Learning Analytics Models Include Sensitive Attributes? Explaining the Why

Anonymization by Local Recoding in Data with Attribute Hierarchical Taxonomies

Secure Outsourced Frequent Pattern Mining by Fully Homomorphic Encryption

Efficient discovery of risk patterns in medical data

Satisfying Privacy Requirements: One Step before Anonymization

Stabilising Job Survival Analysis for Disability Employment Services in Unseen Environments

Ensemble Methods for MiRNA Target Prediction from Expression Data

miRspongeR 2.0: an enhanced R package for exploring miRNA sponge regulation

Radio light curve of the galaxy possibly associated with FRB 150418

From Observational Studies to Causal Rule Mining

Privacy Protection for Genomic Data: Current Techniques and Challenges

Using multiple and negative target rules to make classifiers more understandable

Training Neural Networks with Random Noise Images for Adversarial Robustness

LoPAD: A Local Prediction Approach to Anomaly Detection

From spin noise to systematics: stochastic processes in the first International Pulsar Timing Array data release

Decision Support for Disability Employment using Counterfactual Survival Analysis

On discovery of functional dependencies from data

Data-driven discovery of causal interactions

Identifying direct miRNA–mRNA causal regulatory relationships in heterogeneous data

Mining Informative Rule Set for Prediction

From Association Analysis to Causal Discovery

Validating Privacy Requirements in Large Survey Rating Data

Randomize Adversarial Defense in a Light Way

The winning methods for predicting cellular position in the DREAM single-cell transcriptomics challenge

A novel framework for inferring condition-specific TF and miRNA co-regulation of protein–protein interactions

Integrating Global and Local Feature Selection for Multi-Label Learning

Trends and Applications in Knowledge Discovery and Data Mining

Injecting purpose and trust into data anonymisation

A pseudotemporal causality approach to identifying miRNA–mRNA interactions during biological processes

Discovery of functional miRNA–mRNA regulatory modules with computational methods