ARDC Research Link Australia

Publication

Overcoming weaknesses of density peak clustering using a data-dependent similarity measure

Publisher: Elsevier BV

Date: 05-2023

DOI: 10.1016/J.PATCOG.2022.109287

Publication

Automated assessment of biological database assertions using the scientific literature.

Publisher: Springer Science and Business Media LLC

Date: 29-04-2019

DOI: 10.1186/S12859-019-2801-X

Publication

Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms

Publisher: Elsevier BV

Date: 03-2016

DOI: 10.1016/J.IS.2015.07.008

Publication

Relevance- and interface-driven clustering for visual information retrieval

Publisher: Elsevier BV

Date: 12-2020

DOI: 10.1016/J.IS.2020.101592

Publication

Evaluation of personalized social ranking functions of information retrieval

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-39200-9_24

Publication

Personalized social query expansion using social bookmarking systems

Publisher: ACM

Date: 24-07-2011

DOI: 10.1145/2009916.2010075

Publication

A study of query reformulation for patent prior art search with partial patent applications

Publisher: ACM

Date: 08-06-2015

DOI: 10.1145/2746090.2746092

Publication

ModelOps for enhanced decision-making and governance in emergency control rooms

Publisher: Springer Science and Business Media LLC

Date: 25-04-2022

DOI: 10.1007/S10669-022-09855-1

Abstract: For mission critical (MC) applications such as bushfire emergency management systems (EMS), understanding the current situation as a disaster unfolds is critical to saving lives, infrastructure and the environment. Incident control-room operators manage complex information and systems, especially with the emergence of Big Data. They are increasingly making decisions supported by artificial intelligence (AI) and machine learning (ML) tools for data analysis, prediction and decision-making. As the volume, speed and complexity of information increases due to more frequent fire events, greater availability of myriad IoT sensors, smart devices, satellite data and burgeoning use of social media, the advances in AI and ML that help to manage Big Data and support decision-making are increasingly perceived as “Black Box”. This paper aims to scope the requirements for bushfire EMS to improve Big Data management and governance of AI/ML. An analysis of ModelOps technology, used increasingly in the commercial sector, is undertaken to determine what components might be fit-for-purpose. The result is a novel set of ModelOps features, EMS requirements and an EMS-ModelOps framework that resolves more than 75% of issues whilst being sufficiently generic to apply to other types of mission-critical applications.

Publication

Personalized Social Query Expansion Using Social Annotations

Publisher: Springer Berlin Heidelberg

Date: 2019

DOI: 10.1007/978-3-662-58664-8_1

Publication

Literature Consistency of Bioinformatics Sequence Databases is Effective for Assessing Record Quality

Publisher: Cold Spring Harbor Laboratory

Date: 23-01-2017

DOI: 10.1101/101873

Abstract: Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from in idual laboratories, as well as from bulk submissions from large-scale sequencing centres their ersity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness, and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using Principal Component Analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that 1 record out of 4 is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.

Publication

On Term Selection Techniques for Patent Prior Art Search

Publisher: ACM

Date: 09-08-2015

DOI: 10.1145/2766462.2767801

Publication

Learning Biological Sequence Types Using the Literature

Publisher: ACM

Date: 06-11-2017

DOI: 10.1145/3132847.3133051

Publication

Jarvis: A Voice-based Context-as-a-Service Mobile Tool for a Smart Home Environment

Publisher: IEEE

Date: 06-2022

DOI: 10.1109/MDM55031.2022.00070

Publication

Bayesian Networks for Data Integration in the Absence of Foreign Keys

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 04-2020

DOI: 10.1109/TKDE.2019.2940019

Publication

Towards understanding and mitigating unintended biases in language model-driven conversational recommendation

Publisher: Elsevier BV

Date: 2023

DOI: 10.1016/J.IPM.2022.103139

Publication

Marine-tree: A Large-scale Marine Organisms Dataset for Hierarchical Image Classification

Publisher: ACM

Date: 17-10-2022

DOI: 10.1145/3511808.3557634

Publication

Shine: A deep learning-based accessible parking management system

Publisher: Elsevier BV

Date: 10-2024

DOI: 10.1016/J.ESWA.2023.122205

Publication

Evaluation of Machine Learning Algorithms for Predicting Readmission After Acute Myocardial Infarction Using Routinely Collected Clinical Data

Publisher: Elsevier BV

Date: 06-2020

DOI: 10.1016/J.CJCA.2019.10.023

Publication

A Mask-based Output Layer for Multi-level Hierarchical Classification

Publisher: ACM

Date: 17-10-2022

DOI: 10.1145/3511808.3557534

Publication

A Workflow Analysis of Context-driven Conversational Recommendation

Publisher: ACM

Date: 19-04-2021

DOI: 10.1145/3442381.3450123

Publication

Literature consistency of bioinformatics sequence databases is effective for assessing record quality.

Publisher: Oxford University Press (OUP)

Date: 2017

DOI: 10.1093/DATABASE/BAX021

Publication

Mitigating the Filter Bubble While Maintaining Relevance

Publisher: ACM

Date: 06-07-2022

DOI: 10.1145/3477495.3531890

Publication

Using the QBox platform to assess quality in data integration systems

Publisher: International Information and Engineering Technology Association

Date: 31-12-2010

DOI: 10.3166/ISI.15.6.105-124

Publication

PerSaDoR: Personalized social document representation for improving web search

Publisher: Elsevier BV

Date: 11-2016

DOI: 10.1016/J.INS.2016.07.046

Publication

Multi-field query expansion is effective for biomedical dataset retrieval.

Publisher: Oxford University Press (OUP)

Date: 2017

DOI: 10.1093/DATABASE/BAX062

Publication

A Mask-Based Logic Rules Dissemination Method for Sentiment Classifiers

Publisher: Springer Nature Switzerland

Date: 2023

DOI: 10.1007/978-3-031-28244-7_25

Publication

A longitudinal study of topic classification on Twitter

Publisher: PeerJ

Date: 07-06-2022

DOI: 10.7717/PEERJ-CS.991

Abstract: Twitter represents a massively distributed information source over topics ranging from social and political events to entertainment and sports news. While recent work has suggested this content can be narrowed down to the personalized interests of in idual users by training topic filters using standard classifiers, there remain many open questions about the efficacy of such classification-based filtering approaches. For ex le, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? In addition, how robust is a topic classifier over the time horizon, e.g ., can a model trained in 1 year be used for making predictions in the subsequent year? Furthermore, what features, feature classes, and feature attributes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 erse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that: (i) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training though performance degrades with time, (ii) the classes of hashtags and simple terms contain the most informative feature instances, (iii) removing tweets containing training hashtags from the validation set allows better generalization, and (iv) the simple volume of tweets by a user correlates more with their informativeness than their follower or friend count. In summary, this work provides a long-term study of topic classifiers on Twitter that further justifies classification-based topical filtering approaches while providing detailed insight into the feature properties most critical for topic classifier performance.

Publication

Support Matrix Machine via Joint $\ell_{2,1}$ and Nuclear Norm Minimization Under Matrix Completion Framework for Classification of Corrupted Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TNNLS.2023.3293888

Publication

GQBox: geospatial data quality assessment

Publisher: ACM

Date: 02-11-2010

DOI: 10.1145/1869790.1869884

Publication

Relevance-driven clustering for visual information retrieval on twitter

Publisher: ACM

Date: 08-03-2019

DOI: 10.1145/3295750.3298914

Publication

A User-Centric Analysis of Social Media for Stock Market Prediction

Publisher: Association for Computing Machinery (ACM)

Date: 27-03-2023

DOI: 10.1145/3532856

Abstract: Social media platforms such as Twitter or StockTwits are widely used for sharing stock market opinions between investors, traders, and entrepreneurs. Empirically, previous work has shown that the content posted on these social media platforms can be leveraged to predict various aspects of stock market performance. Nonetheless, actors on these social media platforms may not always have altruistic motivations and may instead seek to influence stock trading behavior through the (potentially misleading) information they post. While a lot of previous work has sought to analyze how social media can be used to predict the stock market, there remain many questions regarding the quality of the predictions and the behavior of active users on these platforms. To this end, this article seeks to address a number of open research questions: Which social media platform is more predictive of stock performance? What posted content is actually predictive, and over what time horizon? How does stock market posting behavior vary among different users? Are all users trustworthy or do some user’s predictions consistently mislead about the true stock movement? To answer these questions, we analyzed data from Twitter and StockTwits covering almost 5 years of posted messages spanning 2015 to 2019. The results of this large-scale study provide a number of important insights among which we present the following: (i) StockTwits is a more predictive source of information than Twitter, leading us to focus our analysis on StockTwits (ii) on StockTwits, users’ self-labeled sentiments are correlated with the stock market but are only slightly predictive in aggregate over the short-term (iii) there are at least three clear types of temporal predictive behavior for users over a 144 days horizon: short, medium, and long term and (iv) consistently incorrect users who are reliably wrong tend to exhibit what we conjecture to be “botlike” post content and their removal from the data tends to improve stock market predictions from self-labeled content.

Publication

Automated Detection of Records in Biological Sequence Databases that are Inconsistent with the Literature

Publisher: Cold Spring Harbor Laboratory

Date: 18-01-2017

DOI: 10.1101/101246

Abstract: We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as “confident” or “suspicious” . Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.

Publication

An Analysis of Logic Rule Dissemination in Sentiment Classifiers

Publisher: Springer International Publishing

Date: 2022

DOI: 10.1007/978-3-031-13643-6_9

Publication

Using social annotations to enhance document representation for personalized search

Publisher: ACM

Date: 28-07-2013

DOI: 10.1145/2484028.2484130

Publication

PERCY: A post-hoc explanation-based score for logic rule dissemination consistency assessment in sentiment classification

Publisher: Elsevier BV

Date: 09-2023

DOI: 10.1016/J.KNOSYS.2023.110685

Publication

SoPRa: a new social personalized ranking function for improving web search

Publisher: ACM

Date: 28-07-2013

DOI: 10.1145/2484028.2484131

Publication

One-Class Collaborative Filtering with the Queryable Variational Autoencoder

Publisher: ACM

Date: 18-07-2019

DOI: 10.1145/3331184.3331292

Publication

LAICOS: an open source platform for personalized social web search

Publisher: ACM

Date: 11-08-2013

DOI: 10.1145/2487575.2487705

Publication

Automated Detection of Records in Biological Sequence Databases that are Inconsistent with the Literature

Publisher: Elsevier BV

Date: 07-2017

DOI: 10.1016/J.JBI.2017.06.015

Abstract: We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as "confident" or "suspicious". Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.

Publication

Mutliresolutional ensemble PartialNet for Alzheimer detection using magnetic resonance imaging data

Publisher: Hindawi Limited

Date: 18-02-2052

DOI: 10.1002/INT.22856

Publication

A Novel Regularizer for Temporally Stable Learning with an Application to Twitter Topic Classification

Publisher: Society for Industrial and Applied Mathematics

Date: 06-05-2019

DOI: 10.1137/1.9781611975673.25

Mohamed Reda Bouadjenek

Researcher

Related Links

Publications

Overcoming weaknesses of density peak clustering using a data-dependent similarity measure

Automated assessment of biological database assertions using the scientific literature.

Social networks and information retrieval, how are they converging? A survey, a taxonomy and an analysis of social information retrieval approaches and platforms

Relevance- and interface-driven clustering for visual information retrieval

Evaluation of personalized social ranking functions of information retrieval

Personalized social query expansion using social bookmarking systems

A study of query reformulation for patent prior art search with partial patent applications

ModelOps for enhanced decision-making and governance in emergency control rooms

Personalized Social Query Expansion Using Social Annotations

Literature Consistency of Bioinformatics Sequence Databases is Effective for Assessing Record Quality

On Term Selection Techniques for Patent Prior Art Search

Learning Biological Sequence Types Using the Literature

Jarvis: A Voice-based Context-as-a-Service Mobile Tool for a Smart Home Environment

Bayesian Networks for Data Integration in the Absence of Foreign Keys

Towards understanding and mitigating unintended biases in language model-driven conversational recommendation

Marine-tree: A Large-scale Marine Organisms Dataset for Hierarchical Image Classification

Shine: A deep learning-based accessible parking management system

Evaluation of Machine Learning Algorithms for Predicting Readmission After Acute Myocardial Infarction Using Routinely Collected Clinical Data

A Mask-based Output Layer for Multi-level Hierarchical Classification

A Workflow Analysis of Context-driven Conversational Recommendation

Literature consistency of bioinformatics sequence databases is effective for assessing record quality.

Mitigating the Filter Bubble While Maintaining Relevance

Using the QBox platform to assess quality in data integration systems

PerSaDoR: Personalized social document representation for improving web search

Multi-field query expansion is effective for biomedical dataset retrieval.

A Mask-Based Logic Rules Dissemination Method for Sentiment Classifiers

A longitudinal study of topic classification on Twitter

Support Matrix Machine via Joint $\ell_{2,1}$ and Nuclear Norm Minimization Under Matrix Completion Framework for Classification of Corrupted Data

GQBox: geospatial data quality assessment

Relevance-driven clustering for visual information retrieval on twitter

A User-Centric Analysis of Social Media for Stock Market Prediction

Automated Detection of Records in Biological Sequence Databases that are Inconsistent with the Literature

An Analysis of Logic Rule Dissemination in Sentiment Classifiers

Using social annotations to enhance document representation for personalized search

PERCY: A post-hoc explanation-based score for logic rule dissemination consistency assessment in sentiment classification

SoPRa: a new social personalized ranking function for improving web search

One-Class Collaborative Filtering with the Queryable Variational Autoencoder

LAICOS: an open source platform for personalized social web search

Automated Detection of Records in Biological Sequence Databases that are Inconsistent with the Literature

Mutliresolutional ensemble PartialNet for Alzheimer detection using magnetic resonance imaging data

A Novel Regularizer for Temporally Stable Learning with an Application to Twitter Topic Classification

Related Organisations

Deakin University

University Of Toronto

Inria Centre De Recherche Sophia Antipolis Méditerranée

Alcatel-Lucent Bell Labs

University Of Melbourne

Related Funding Activities

ARDC NEWSLETTER SIGNUP