ORCID Profile
0000-0003-1807-430X
Current Organisation
Deakin University
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Elsevier BV
Date: 05-2023
Publisher: Springer Science and Business Media LLC
Date: 29-04-2019
Publisher: Elsevier BV
Date: 03-2016
Publisher: Elsevier BV
Date: 12-2020
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: ACM
Date: 24-07-2011
Publisher: ACM
Date: 08-06-2015
Publisher: Springer Science and Business Media LLC
Date: 25-04-2022
DOI: 10.1007/S10669-022-09855-1
Abstract: For mission critical (MC) applications such as bushfire emergency management systems (EMS), understanding the current situation as a disaster unfolds is critical to saving lives, infrastructure and the environment. Incident control-room operators manage complex information and systems, especially with the emergence of Big Data. They are increasingly making decisions supported by artificial intelligence (AI) and machine learning (ML) tools for data analysis, prediction and decision-making. As the volume, speed and complexity of information increases due to more frequent fire events, greater availability of myriad IoT sensors, smart devices, satellite data and burgeoning use of social media, the advances in AI and ML that help to manage Big Data and support decision-making are increasingly perceived as “Black Box”. This paper aims to scope the requirements for bushfire EMS to improve Big Data management and governance of AI/ML. An analysis of ModelOps technology, used increasingly in the commercial sector, is undertaken to determine what components might be fit-for-purpose. The result is a novel set of ModelOps features, EMS requirements and an EMS-ModelOps framework that resolves more than 75% of issues whilst being sufficiently generic to apply to other types of mission-critical applications.
Publisher: Springer Berlin Heidelberg
Date: 2019
Publisher: Cold Spring Harbor Laboratory
Date: 23-01-2017
DOI: 10.1101/101873
Abstract: Bioinformatics sequence databases such as Genbank or UniProt contain hundreds of millions of records of genomic data. These records are derived from direct submissions from in idual laboratories, as well as from bulk submissions from large-scale sequencing centres their ersity and scale means that they suffer from a range of data quality issues including errors, discrepancies, redundancies, ambiguities, incompleteness, and inconsistencies with the published literature. In this work, we seek to investigate and analyze the data quality of sequence databases from the perspective of a curator, who must detect anomalous and suspicious records. Specifically, we emphasize the detection of inconsistent records with respect to the literature. Focusing on GenBank, we propose a set of 24 quality indicators, which are based on treating a record as a query into the published literature, and then use query quality predictors. We then carry out an analysis that shows that the proposed quality indicators and the quality of the records have a mutual relationship, in which one depends on the other. We propose to represent record-literature consistency as a vector of these quality indicators. By reducing the dimensionality of this representation for visualization purposes using Principal Component Analysis, we show that records which have been reported as inconsistent with the literature fall roughly in the same area, and therefore share similar characteristics. By manually analyzing records not previously known to be erroneous that fall in the same area than records know to be inconsistent, we show that 1 record out of 4 is inconsistent with respect to the literature. This high density of inconsistent record opens the way towards the development of automatic methods for the detection of faulty records. We conclude that literature inconsistency is a meaningful strategy for identifying suspicious records.
Publisher: ACM
Date: 09-08-2015
Publisher: ACM
Date: 06-11-2017
Publisher: IEEE
Date: 06-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2020
Publisher: Elsevier BV
Date: 2023
Publisher: ACM
Date: 17-10-2022
Publisher: Elsevier BV
Date: 10-2024
Publisher: Elsevier BV
Date: 06-2020
Publisher: ACM
Date: 17-10-2022
Publisher: ACM
Date: 19-04-2021
Publisher: Oxford University Press (OUP)
Date: 2017
Publisher: ACM
Date: 06-07-2022
Publisher: International Information and Engineering Technology Association
Date: 31-12-2010
Publisher: Elsevier BV
Date: 11-2016
Publisher: Oxford University Press (OUP)
Date: 2017
Publisher: Springer Nature Switzerland
Date: 2023
Publisher: PeerJ
Date: 07-06-2022
DOI: 10.7717/PEERJ-CS.991
Abstract: Twitter represents a massively distributed information source over topics ranging from social and political events to entertainment and sports news. While recent work has suggested this content can be narrowed down to the personalized interests of in idual users by training topic filters using standard classifiers, there remain many open questions about the efficacy of such classification-based filtering approaches. For ex le, over a year or more after training, how well do such classifiers generalize to future novel topical content, and are such results stable across a range of topics? In addition, how robust is a topic classifier over the time horizon, e.g ., can a model trained in 1 year be used for making predictions in the subsequent year? Furthermore, what features, feature classes, and feature attributes are most critical for long-term classifier performance? To answer these questions, we collected a corpus of over 800 million English Tweets via the Twitter streaming API during 2013 and 2014 and learned topic classifiers for 10 erse themes ranging from social issues to celebrity deaths to the “Iran nuclear deal”. The results of this long-term study of topic classifier performance provide a number of important insights, among them that: (i) such classifiers can indeed generalize to novel topical content with high precision over a year or more after training though performance degrades with time, (ii) the classes of hashtags and simple terms contain the most informative feature instances, (iii) removing tweets containing training hashtags from the validation set allows better generalization, and (iv) the simple volume of tweets by a user correlates more with their informativeness than their follower or friend count. In summary, this work provides a long-term study of topic classifiers on Twitter that further justifies classification-based topical filtering approaches while providing detailed insight into the feature properties most critical for topic classifier performance.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 02-11-2010
Publisher: ACM
Date: 08-03-2019
Publisher: Association for Computing Machinery (ACM)
Date: 27-03-2023
DOI: 10.1145/3532856
Abstract: Social media platforms such as Twitter or StockTwits are widely used for sharing stock market opinions between investors, traders, and entrepreneurs. Empirically, previous work has shown that the content posted on these social media platforms can be leveraged to predict various aspects of stock market performance. Nonetheless, actors on these social media platforms may not always have altruistic motivations and may instead seek to influence stock trading behavior through the (potentially misleading) information they post. While a lot of previous work has sought to analyze how social media can be used to predict the stock market, there remain many questions regarding the quality of the predictions and the behavior of active users on these platforms. To this end, this article seeks to address a number of open research questions: Which social media platform is more predictive of stock performance? What posted content is actually predictive, and over what time horizon? How does stock market posting behavior vary among different users? Are all users trustworthy or do some user’s predictions consistently mislead about the true stock movement? To answer these questions, we analyzed data from Twitter and StockTwits covering almost 5 years of posted messages spanning 2015 to 2019. The results of this large-scale study provide a number of important insights among which we present the following: (i) StockTwits is a more predictive source of information than Twitter, leading us to focus our analysis on StockTwits (ii) on StockTwits, users’ self-labeled sentiments are correlated with the stock market but are only slightly predictive in aggregate over the short-term (iii) there are at least three clear types of temporal predictive behavior for users over a 144 days horizon: short, medium, and long term and (iv) consistently incorrect users who are reliably wrong tend to exhibit what we conjecture to be “botlike” post content and their removal from the data tends to improve stock market predictions from self-labeled content.
Publisher: Cold Spring Harbor Laboratory
Date: 18-01-2017
DOI: 10.1101/101246
Abstract: We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as “confident” or “suspicious” . Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.
Publisher: Springer International Publishing
Date: 2022
Publisher: ACM
Date: 28-07-2013
Publisher: Elsevier BV
Date: 09-2023
Publisher: ACM
Date: 28-07-2013
Publisher: ACM
Date: 18-07-2019
Publisher: ACM
Date: 11-08-2013
Publisher: Elsevier BV
Date: 07-2017
DOI: 10.1016/J.JBI.2017.06.015
Abstract: We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as "confident" or "suspicious". Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.
Publisher: Hindawi Limited
Date: 18-02-2052
DOI: 10.1002/INT.22856
Publisher: Society for Industrial and Applied Mathematics
Date: 06-05-2019
Location: France
No related grants have been discovered for Mohamed Reda Bouadjenek.