ORCID Profile
0000-0002-9954-0159
Current Organisation
Queensland University of Technology
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Adaptive Agents and Intelligent Robotics | Aboriginal and Torres Strait Islander Cultural Studies | Historical Studies | Systems biology | Public Health and Health Services | Artificial Intelligence and Image Processing | Microbiology not elsewhere classified | Aboriginal and Torres Strait Islander History | Pattern Recognition and Data Mining | Museum Studies | Computer Vision | Biological physics | Human Geography Not Elsewhere Classified | Health And Community Services | Biological mathematics | Applied mathematics | Social Policy
The aged | Aboriginal and Torres Strait Islander Development and Welfare | Health related to ageing | Conserving Aboriginal and Torres Strait Islander Heritage | Expanding Knowledge in Engineering | Expanding Knowledge in the Information and Computing Sciences | Other social development and community services | Expanding Knowledge in History and Archaeology |
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2023
Publisher: Elsevier BV
Date: 10-2023
Publisher: Springer International Publishing
Date: 2022
Publisher: IEEE
Date: 12-2010
Publisher: Springer International Publishing
Date: 29-08-2015
Publisher: Springer London
Date: 2010
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: ACM
Date: 18-03-2013
Publisher: IEEE
Date: 08-2010
Publisher: Springer International Publishing
Date: 2019
Publisher: ACM
Date: 29-06-2011
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: International World Wide Web Conferences Steering Committee
Date: 18-05-2015
Publisher: Springer International Publishing
Date: 2022
Publisher: Springer International Publishing
Date: 2014
Publisher: Springer International Publishing
Date: 2019
Publisher: IEEE
Date: 11-2007
Publisher: IGI Global
Date: 2008
Abstract: The business needs, the availability of huge volumes of data and the continuous evolution in Web services functions derive the need of application of data mining in the Web service domain. This article recommends several data mining applications that can leverage problems concerned with the discovery and monitoring of Web services. This article then presents a case study on applying the clustering data mining technique to the Web service usage data to improve the Web service discovery process. This article also discusses the challenges that arise when applying data mining to Web services usage data and abstract informat
Publisher: Springer International Publishing
Date: 2015
Publisher: Springer International Publishing
Date: 2020
Publisher: Springer International Publishing
Date: 2019
Publisher: Springer Science and Business Media LLC
Date: 18-06-2020
Publisher: Elsevier BV
Date: 2021
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2023
Publisher: Springer International Publishing
Date: 2015
Publisher: IGI Global
Date: 2008
DOI: 10.4018/978-1-59904-990-8.CH015
Abstract: XML has gained popularity for information representation, exchange and retrieval. As XML material becomes more abundant, its heterogeneity and structural irregularity limit the knowledge that can be gained. The utilisation of data mining techniques becomes essential for improvement in XML document handling. This chapter presents the capabilities and benefits of data mining techniques in the XML domain, as well as, a conceptualization of the XML mining process. It also discusses the techniques that can be applied to XML document structure and/or content for knowledge discovery.
Publisher: IGI Global
Date: 2006
DOI: 10.4018/978-1-59904-271-8.CH012
Abstract: Web services have recently received much attention in businesses. However, a number of challenges such as lack of experience in estimating the costs, lack of service innovation and monitoring, and lack of methods for locating appropriate services are to be resolved. One possible approach is by learning from the experiences in Web services and from other similar situations. Such a task requires the use of data mining to represent generalizations on common situations. This chapter examines how some of the issues of Web services can be addressed through data mining.
Publisher: Springer International Publishing
Date: 2015
Publisher: Springer International Publishing
Date: 2019
Publisher: IEEE
Date: 11-2013
Publisher: Elsevier BV
Date: 11-2022
Publisher: Oxford University Press (OUP)
Date: 28-01-2019
DOI: 10.1093/LLC/FQY084
Publisher: ACM
Date: 26-10-2010
Publisher: IEEE
Date: 12-2006
Publisher: World Scientific Pub Co Pte Lt
Date: 08-2005
DOI: 10.1142/S0218194005002476
Abstract: Data mining techniques provide people with new power to research and manipulate the existing large volume of data. A data mining process discovers interesting information from the hidden data that can either be used for future prediction and/or intelligently summarising the details of the data. There are many achievements of applying data mining techniques to various areas such as marketing, medical, and financial, although few of them can be currently seen in software engineering domain. In this paper, a proposed data mining application in software engineering domain is explained and experimented. The empirical results demonstrate the capability of data mining techniques in software engineering domain and the potential benefits in applying data mining to this area.
Publisher: Informa UK Limited
Date: 13-07-2022
Publisher: Inderscience Publishers
Date: 2013
Publisher: IEEE
Date: 12-2008
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Springer Science and Business Media LLC
Date: 25-07-2022
DOI: 10.1007/S13278-022-00917-5
Abstract: Social media platforms have become a common place for information exchange among their users. People leave traces of their emotions via text expressions. A systematic collection, analysis, and interpretation of social media data across time and space can give insights into local outbreaks, mental health, and social issues. Such timely insights can help in developing strategies and resources with an appropriate and efficient response. This study analysed a large Spatio-temporal tweet dataset of the Australian sphere related to COVID19. The methodology included a volume analysis, topic modelling, sentiment detection, and semantic brand score to obtain an insight into the COVID19 pandemic outbreak and public discussion in different states and cities of Australia over time. The obtained insights are compared with independently observed phenomena such as government-reported instances.
Publisher: IEEE
Date: 2009
Publisher: Inderscience Publishers
Date: 2011
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2005
DOI: 10.1007/11508069_31
Publisher: Springer Science and Business Media LLC
Date: 17-05-2023
DOI: 10.1007/S42979-023-01819-9
Abstract: In this paper, discriminative associative classification is proposed as a new classification technique based on class discriminative association rules (CDARs). These rules are defined based on discriminative itemsets. The discriminative itemset is frequent in one data class and has much higher frequencies compared with the same itemset in other data classes. The CDAR is a class associative rule (CAR) in one data class that has higher support compared with the same rule in other data classes. Compared to associative classification, there are additional challenges as the Apriori property of the subset is not applicable. The proposed algorithm is designed particularly based on well-defined distinguishing characteristics of the rules, to improve the accuracy and efficiency of the classification in data classes. A novel compact prefix-tree structure is defined for holding the rules in data classes. The empirical analysis shows the effectiveness and efficiency of the proposed method on small and large real datasets.
Publisher: Springer London
Date: 2011
Publisher: Springer US
Date: 2009
Publisher: IEEE
Date: 03-2017
Publisher: ACM
Date: 05-12-2012
Publisher: ACM
Date: 07-11-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2019
Publisher: Springer Science and Business Media LLC
Date: 09-04-2021
Publisher: Springer Science and Business Media LLC
Date: 15-02-2021
Publisher: IEEE
Date: 07-2011
Publisher: IEEE
Date: 02-2018
Publisher: IGI Global
Date: 2005
DOI: 10.4018/978-1-59140-557-3.CH071
Abstract: XML is the new standard for information exchange and retrieval. An XML document has a schema that defines the data definition and structure of the XML document (Abiteboul et al., 2000). Due to the wide acceptance of XML, a number of techniques are required to retrieve and analyze the vast number of XML documents. Automatic deduction of the structure of XML documents for storing semi-structured data has been an active subject among researchers (Abiteboul et al., 2000 Green et al., 2002). A number of query languages for retrieving data from various XML data sources also has been developed (Abiteboul et al., 2000 W3c, 2004). The use of these query languages is limited (e.g., limited types of inputs and outputs, and users of these languages should know exactly what kinds of information are to be accessed). Data mining, on the other hand, allows the user to search out unknown facts, the information hidden behind the data. It also enables users to pose more complex queries (Dunham, 2003).
Publisher: Springer London
Date: 31-07-2014
Publisher: Springer International Publishing
Date: 2015
Publisher: IEEE
Date: 11-2018
Publisher: ACM
Date: 11-08-2013
Publisher: Springer International Publishing
Date: 2020
Publisher: ACM
Date: 13-05-2019
Publisher: Wiley
Date: 04-02-2022
Abstract: Food processing is a complex, multifaceted problem that requires substantial human interaction to optimize the various process parameters to minimize energy consumption and ensure better‐quality products. The development of a machine learning (ML)‐based approach to food processing applications is an exciting and innovative idea for optimizing process parameters and process kinetics to reduce energy consumption, processing time, and ensure better‐quality products however, developing such a novel approach requires significant scientific effort. This paper presents and evaluates ML‐based approaches to various food processing operations such as drying, frying, baking, canning, extrusion, encapsulation, and fermentation to predict process kinetics. A step‐by‐step procedure to develop an ML‐based model and its practical implementation is presented. The key challenges of neural network training and testing algorithms and their limitations are discussed to assist readers in selecting algorithms for solving problems specific to food processing. In addition, this paper presents the potential and challenges of applying ML‐based techniques to hybrid food processing operations. The potential of physics‐informed ML modeling techniques for food processing applications and their strategies is also discussed. It is expected that the potential information of this paper will be valuable in advancing the ML‐based technology for food processing applications.
Publisher: IEEE
Date: 12-2007
DOI: 10.1109/WI.2006.106
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Insight Society
Date: 04-12-2018
Publisher: Springer International Publishing
Date: 27-11-2019
Publisher: IEEE
Date: 12-2006
DOI: 10.1109/WI.2006.49
Publisher: Association for Computing Machinery (ACM)
Date: 30-05-2020
DOI: 10.1145/3385654
Abstract: With the advancements in computing technology and web-based applications, data are increasingly generated in multi-dimensional form. These data are usually sparse due to the presence of a large number of users and fewer user interactions. To deal with this, the Nonnegative Tensor Factorization (NTF) based methods have been widely used. However existing factorization algorithms are not suitable to process in all three conditions of size, density, and rank of the tensor. Consequently, their applicability becomes limited. In this article, we propose a novel fast and efficient NTF algorithm using the element selection approach. We calculate the element importance using Lipschitz continuity and propose a saturation point-based element selection method that chooses a set of elements column-wise for updating to solve the optimization problem. Empirical analysis reveals that the proposed algorithm is scalable in terms of tensor size, density, and rank in comparison to the relevant state-of-the-art algorithms.
Publisher: Public Library of Science (PLoS)
Date: 09-03-2022
DOI: 10.1371/JOURNAL.PONE.0264360
Abstract: Appropriate descriptions of statistical methods are essential for evaluating research quality and reproducibility. Despite continued efforts to improve reporting in publications, inadequate descriptions of statistical methods persist. At times, reading statistical methods sections can conjure feelings of dèjá vu , with content resembling cut-and-pasted or “boilerplate text” from already published work. Instances of boilerplate text suggest a mechanistic approach to statistical analysis, where the same default methods are being used and described using standardized text. To investigate the extent of this practice, we analyzed text extracted from published statistical methods sections from PLOS ONE and the Australian and New Zealand Clinical Trials Registry (ANZCTR). Topic modeling was applied to analyze data from 111,731 papers published in PLOS ONE and 9,523 studies registered with the ANZCTR. PLOS ONE topics emphasized definitions of statistical significance, software and descriptive statistics. One in three PLOS ONE papers contained at least 1 sentence that was a direct copy from another paper. 12,675 papers (11%) closely matched to the sentence “a p-value 0.05 was considered statistically significant”. Common topics across ANZCTR studies differentiated between study designs and analysis methods, with matching text found in approximately 3% of sections. Our findings quantify a serious problem affecting the reporting of statistical methods and shed light on perceptions about the communication of statistics as part of the scientific process. Results further emphasize the importance of rigorous statistical review to ensure that adequate descriptions of methods are prioritized over relatively minor details such as p-values and software when reporting research outcomes.
Publisher: Springer International Publishing
Date: 2014
Publisher: Elsevier BV
Date: 04-2022
Publisher: Springer International Publishing
Date: 14-11-2015
Publisher: IEEE
Date: 12-2020
Publisher: Springer International Publishing
Date: 2019
Publisher: IGI Global
Date: 2009
DOI: 10.4018/978-1-60566-026-4.CH662
Abstract: Research and practices in electronic business (e-business) have witnessed an exponential growth in the last few years (Liautand & Hammond, 2001). Wireless technology has also evolved from simple analog products designed for business use to emerging radioactive, signal-based wireless communications (Shafi, 2001). The tremendous potential of mobile computing and e-business has created a new concept of mobile e-business or e-business over wireless devices (m-business).
Publisher: Springer Science and Business Media LLC
Date: 17-10-2023
Publisher: Springer-Verlag
Date: 2006
DOI: 10.1007/11766278_33
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Association for Computing Machinery (ACM)
Date: 18-08-2010
Abstract: INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2009 evaluation c aign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining. INEX in running entirely on volunteer effort by the IR research community: anyone with an idea and some time to spend, can have a major impact.
Publisher: ACM
Date: 16-09-2009
Publisher: Springer International Publishing
Date: 2017
Publisher: Springer International Publishing
Date: 2023
Publisher: World Scientific Pub Co Pte Lt
Date: 06-2007
DOI: 10.1142/S0218001407005648
Abstract: Since the emergence in the popularity of XML for data representation and exchange over the Web, the distribution of XML documents has rapidly increased. It has become a challenge for researchers to turn these documents into a more useful information utility. In this paper, we introduce a novel clustering algorithm PCXSS that keeps the heterogeneous XML documents into various groups according to their similar structural and semantic representations. We develop a global criterion function CPSim that progressively measures the similarity between a XML document and existing clusters, ignoring the need to compute the similarity between two in idual documents. The experimental analysis shows the method to be fast and accurate.
Publisher: Informa UK Limited
Date: 26-03-2019
Publisher: Springer Nature Singapore
Date: 2022
Publisher: ACM
Date: 25-05-2011
Publisher: Springer Singapore
Date: 2019
Publisher: Springer Science and Business Media LLC
Date: 15-06-2021
Publisher: IEEE
Date: 12-2007
DOI: 10.1109/WI.2006.62
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: ACM
Date: 03-12-2012
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Science and Business Media LLC
Date: 07-05-2019
Publisher: Oxford University Press (OUP)
Date: 28-08-2010
Publisher: Springer Science and Business Media LLC
Date: 27-06-2023
DOI: 10.1007/S42979-023-01887-X
Abstract: In this paper, we present an efficient novel method for mining discriminative itemsets over data streams using the sliding window model. Discriminative itemsets are the itemsets that are frequent in the target data stream, and their frequency in the target stream is much higher in comparison to their frequency in the rest of the streams. The problem of mining discriminative itemsets has more challenges than mining frequent itemsets, especially in the sliding window model, as during the window frame sliding, the algorithms have to deal with the combinatorial explosion of itemsets in more than one data stream, for the transactions coming in and going out of the sliding window. We propose a single scan algorithm using two novel in-memory data structures for mining discriminative itemsets in a combination of offline and online sliding windows. Offline processing is used for controlling the generation of many unpromising itemsets. Online processing is used for getting more up-to-date and accurate online answers between two offline slidings. The discovered discriminative itemsets are accurately updated in the offline sliding window periodically, and the mining process is continued in the online sliding between two periodic offline slidings. The extensive empirical analysis shows that the proposed algorithm provides efficient time and space complexities with full accuracy. The algorithm can handle large, fast-speed, and complex data streams.
Publisher: Springer Singapore
Date: 2021
Publisher: Springer Science and Business Media LLC
Date: 13-02-2014
Publisher: Springer Singapore
Date: 2021
Publisher: IEEE
Date: 04-2020
Publisher: Springer Singapore
Date: 2021
Publisher: Springer Science and Business Media LLC
Date: 20-05-2015
Publisher: ACM
Date: 08-02-2016
Publisher: Springer Science and Business Media LLC
Date: 30-04-2022
DOI: 10.1007/S41060-022-00324-1
Abstract: The world is witnessing the devastating effects of the COVID-19 pandemic. Each country responded to contain the spread of the virus in the early stages through erse response measures. Interpreting these responses and their patterns globally is essential to inform future responses to COVID-19 variants and future pandemics. A stochastic epidemiological model (SEM) is a well-established mathematical tool that helps to analyse the spread of infectious diseases through communities and the effects of various response measures. However, interpreting the outcome of these models is complex and often requires manual effort. In this paper, we propose a novel method to provide the explainability of an epidemiological model. We represent the output of SEM as a tensor model. We then apply nonnegative tensor factorization (NTF) to identify patterns of global response behaviours of countries and cluster the countries based on these patterns. We interpret the patterns and clusters to understand the global response behaviour of countries in the early stages of the pandemic. Our experimental results demonstrate the advantage of clustering using NTF and provide useful insights into the characteristics of country clusters.
Publisher: Springer Nature Switzerland
Date: 2023
Publisher: MDPI AG
Date: 07-2021
DOI: 10.3390/JRFM14070298
Abstract: This paper proposes a conceptual modeling framework based on category theory that serves as a tool to study common structures underlying erse approaches to modeling credit default that at first sight may appear to have nothing in common. The framework forms the basis for an entropy-based stacking model to address issues of inconsistency and bias in classification performance. Based on the Lending Club’s peer-to-peer loans dataset and Taiwanese credit card clients dataset, relative to in idual base models, the proposed entropy-based stacking model provides more consistent performance across multiple data environments and less biased performance in terms of default classification. The process itself is agnostic to the base models selected and its performance superior, regardless of the models selected.
Publisher: ACM
Date: 26-03-2012
Publisher: Routledge
Date: 12-04-2023
Publisher: ACM
Date: 26-03-2012
Publisher: IEEE
Date: 04-2018
Publisher: Springer Science and Business Media LLC
Date: 24-04-2008
Publisher: Springer Science and Business Media LLC
Date: 14-12-2018
Publisher: Springer Berlin Heidelberg
Date: 2012
Publisher: Springer Singapore
Date: 2021
Publisher: Springer Berlin Heidelberg
Date: 2007
Publisher: Springer Singapore
Date: 2021
Publisher: Elsevier BV
Date: 08-2000
Publisher: IEEE
Date: 12-2010
Publisher: IEEE
Date: 07-2011
Publisher: Springer Singapore
Date: 2021
Publisher: IEEE
Date: 12-2010
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer International Publishing
Date: 2018
Publisher: Springer International Publishing
Date: 2018
Publisher: IEEE
Date: 11-2018
Publisher: Springer Singapore
Date: 2019
Publisher: ACM
Date: 16-09-2010
Publisher: IEEE
Date: 12-2009
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2021
Publisher: Springer Singapore
Date: 2019
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Elsevier BV
Date: 03-2020
Publisher: Springer International Publishing
Date: 2018
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: The Intelligent Networks and Systems Society
Date: 28-02-2021
DOI: 10.22266/IJIES2021.0228.36
Abstract: The tag-based recommendation systems that are built based on tensor models commonly suffer from the data sparsity problem. In recent years, various weighted-learning approaches have been proposed to tackle such a problem. The approaches can be categorized by how a weighting scheme is used for exploiting the data sparsity – like employing it to construct a weighted tensor used for weighing the tensor model during the learning process. In this paper, we propose a new weighted-learning approach for exploiting data sparsity in tag-based item recommendation system. We introduce a technique to represent the users’ tag preferences for leveraging the weighted-learning approach. The key idea of the proposed technique comes from the fact that users use different choices of tags to annotate the same item while the same tag may be used to annotate various items in tag-based systems. This points out that users’ tag usage likeliness is different and therefore their tag preferences are also different. We then present three novel weighting schemes that are varied in manners by how the ordinal weighting values are used for labelling the users’ tag preferences. As a result, three weighted tensors are generated based on each scheme. To implement the proposed schemes for generating item recommendations, we develop a novel weighted-learning method called as WRank (Weighted Rank). Our experiments show that considering the users' tag preferences in the tensor-based weightinglearning approach can solve the data sparsity problem as well as improve the quality of recommendation.
Publisher: ACM
Date: 02-11-2009
Publisher: IEEE
Date: 12-2008
Publisher: Springer International Publishing
Date: 2014
Publisher: IEEE
Date: 11-2007
DOI: 10.1109/WI.2007.82
Publisher: IEEE
Date: 12-2020
Publisher: Elsevier BV
Date: 05-2009
DOI: 10.1016/J.NEUNET.2009.02.001
Abstract: Artificial neural networks (ANN) have demonstrated good predictive performance in a wide range of applications. They are, however, not considered sufficient for knowledge representation because of their inability to represent the reasoning process succinctly. This paper proposes a novel methodology Gyan that represents the knowledge of a trained network in the form of restricted first-order predicate rules. The empirical results demonstrate that an equivalent symbolic interpretation in the form of rules with predicates, terms and variables can be derived describing the overall behaviour of the trained ANN with improved comprehensibility while maintaining the accuracy and fidelity of the propositional rules.
Publisher: ACM
Date: 23-08-2017
Publisher: Springer Nature Singapore
Date: 2023
Publisher: Springer International Publishing
Date: 2014
Publisher: Association for Computing Machinery (ACM)
Date: 03-10-2020
DOI: 10.1145/3399712
Abstract: Outlier detection in text data collections has become significant due to the need of finding anomalies in the myriad of text data sources. High feature dimensionality, together with the larger size of these document collections, presents a need for developing accurate outlier detection methods with high efficiency. Traditional outlier detection methods face several challenges including data sparseness, distance concentration, and the presence of a larger number of sub-groups when dealing with text data. In this article, we propose to address these issues by developing novel concepts such as presenting documents with the rare document frequency, finding ranking-based neighborhood for similarity computation, and identifying sub-dense local neighborhoods in high dimensions. To improve the proposed primary method based on rare document frequency, we present several novel ensemble approaches using the ranking concept to reduce the false identifications while finding the higher number of true outliers. Extensive empirical analysis shows that the proposed method and its ensemble variations improve the quality of outlier detection in document repositories as well as they are found scalable compared to the relevant benchmarking methods.
Publisher: Springer Science and Business Media LLC
Date: 07-04-2018
Publisher: IEEE
Date: 04-2015
Publisher: IEEE
Date: 12-2020
Publisher: SAGE Publications
Date: 03-2023
DOI: 10.1177/08944393231158788
Abstract: Among the pressing issues facing Australian and other First Nations peoples is the repatriation of the bodily remains of their ancestors, which are currently held in Western scientific institutions. The success of securing the return of these remains to their communities for reburial depends largely on locating information within scientific and other literature published between 1790 and 1970 documenting their theft, donation, sale, or exchange between institutions. This article reports on collaborative research by data scientists and social science researchers in the Research, Reconcile, Renew Network (RRR) to develop and apply text mining techniques to identify this vital information. We describe our work to date on developing a machine learning-based solution to automate the process of finding and semantically analysing relevant texts. Classification models, particularly deep learning-based models, are known to have low accuracy when trained with small amounts of labelled (i.e. relevant/non-relevant) documents. To improve the accuracy of our detection model, we explore the use of an Informed Neural Network (INN) model that describes documentary content using expert-informed contextual knowledge. Only a few labelled documents are used to provide specificity to the model, using conceptually related keywords identified by RRR experts in provenance research. The results confirm the value of using an INN network model for identifying relevant documents related to the investigation of the global commercial trade in Indigenous human remains. Empirical analysis suggests that this INN model can be generalized for use by other researchers in the social sciences and humanities who want to extract relevant information from large textual corpora.
Publisher: Informa UK Limited
Date: 28-01-2019
Publisher: Association for Computing Machinery (ACM)
Date: 10-2011
Abstract: In the last few years we have observed a proliferation of approaches for clustering XML documents and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the clustering of XML data. These applications need data in the form of similar contents, tags, paths, structures, and semantics. In this article, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. In this presentation, we aim to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering component. Finally, the article moves into the description of future trends and research issues that still need to be faced.
Publisher: Springer International Publishing
Date: 2015
Publisher: ACM
Date: 18-03-2013
Publisher: Elsevier BV
Date: 11-2023
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: IEEE
Date: 06-2012
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: ACM
Date: 21-03-2011
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Springer Berlin Heidelberg
Date: 2006
DOI: 10.1007/11610113_74
Publisher: Springer Berlin Heidelberg
Date: 2007
Publisher: Association for Computing Machinery (ACM)
Date: 11-02-2021
DOI: 10.1145/3446343
Abstract: Language model (LM) has become a common method of transfer learning in Natural Language Processing (NLP) tasks when working with small labeled datasets. An LM is pretrained using an easily available large unlabelled text corpus and is fine-tuned with the labelled data to apply to the target (i.e., downstream) task. As an LM is designed to capture the linguistic aspects of semantics, it can be biased to linguistic features. We argue that exposing an LM model during fine-tuning to instances that capture erse semantic aspects (e.g., topical, linguistic, semantic relations) present in the dataset will improve its performance on the underlying task. We propose a Mixed Aspect S ling (MAS) framework to s le instances that capture different semantic aspects of the dataset and use the ensemble classifier to improve the classification performance. Experimental results show that MAS performs better than random s ling as well as the state-of-the-art active learning models to abuse detection tasks where it is hard to collect the labelled data for building an accurate classifier.
Publisher: IEEE
Date: 11-2018
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Singapore
Date: 2020
Publisher: Springer Science and Business Media LLC
Date: 29-07-2021
Publisher: ACM
Date: 13-06-2010
Publisher: World Scientific Pub Co Pte Ltd
Date: 03-12-2022
DOI: 10.1142/S0219649222500095
Abstract: We tackle the problem of discriminative itemset mining. Given a set of datasets, we want to find the itemsets that are frequent in the target dataset and have much higher frequencies compared with the same itemsets in other datasets. Such itemsets are very useful for dataset discrimination. We demonstrate that this problem has important applications and, at a same time, is very challenging. We present the DISSparse algorithm, a mining method that uses two determinative heuristics based on the sparsity characteristics of the discriminative itemsets as a small subset of the frequent itemsets. We prove that the DISSparse algorithm is sound and complete. We experimentally investigate the performance of the proposed DISSparse on a range of datasets, evaluating its efficiency and stability and demonstrating it is substantially faster than the baseline method.
Publisher: IEEE
Date: 11-2018
Publisher: Springer Singapore
Date: 2019
Publisher: Elsevier BV
Date: 12-2010
Publisher: Springer International Publishing
Date: 2018
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: ACM
Date: 03-11-2014
Publisher: Elsevier BV
Date: 05-2007
Publisher: Elsevier BV
Date: 02-2023
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer Science and Business Media LLC
Date: 11-01-2012
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: Springer International Publishing
Date: 2020
Publisher: IEEE
Date: 11-2010
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: IEEE
Date: 08-2018
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: IEEE
Date: 02-2018
Publisher: IEEE
Date: 12-2020
Publisher: Springer International Publishing
Date: 2014
Publisher: IEEE
Date: 11-2013
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: ACM
Date: 07-04-2014
Publisher: Springer Berlin Heidelberg
Date: 2006
DOI: 10.1007/11731139_35
Publisher: Springer International Publishing
Date: 2021
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: SCITEPRESS - Science and Technology Publications
Date: 2017
Publisher: IEEE
Date: 10-2010
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: IEEE
Date: 11-2013
Publisher: ACM
Date: 21-03-2011
Publisher: IEEE
Date: 12-2008
Publisher: ACM
Date: 03-11-2014
Publisher: Association for Computing Machinery (ACM)
Date: 30-04-2013
Abstract: The Web is a steadily evolving resource comprising much more than mere HTML pages. With its ever-growing data sources in a variety of formats, it provides great potential for knowledge discovery. In this article, we shed light on some interesting phenomena of the Web: the deep Web, which surfaces database records as Web pages the Semantic Web, which defines meaningful data exchange formats XML, which has established itself as a lingua franca for Web data exchange and domain-specific markup languages, which are designed based on XML syntax with the goal of preserving semantics in targeted domains. We detail these four developments in Web technology, and explain how they can be used for data mining. Our goal is to show that all these areas can be as useful for knowledge discovery as the HTML-based part of the Web.
Publisher: IGI Global
Date: 2011
Publisher: IEEE
Date: 11-2008
Start Date: 2008
End Date: 2011
Funder: Australian Research Council
View Funded ActivityStart Date: 2014
End Date: 2016
Funder: Australian Research Council
View Funded ActivityStart Date: 2014
End Date: 02-2018
Amount: $336,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2023
End Date: 12-2029
Amount: $35,000,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 07-2008
End Date: 06-2012
Amount: $296,400.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2020
End Date: 12-2024
Amount: $748,829.00
Funder: Australian Research Council
View Funded Activity