ARDC Research Link Australia

Publication

Towards query log based personalization using topic models

Publisher: ACM

Date: 26-10-2010

DOI: 10.1145/1871437.1871745

Publication

Error Detection and Corrections in Indic OCR Using LSTMs

Publisher: IEEE

Date: 11-2017

DOI: 10.1109/ICDAR.2017.13

Publication

Tag navigation

Publisher: ACM

Date: 24-08-2009

DOI: 10.1145/1595836.1595843

Publication

A multi-collection latent topic model for federated search

Publisher: Springer Science and Business Media LLC

Date: 07-10-2011

DOI: 10.1007/S10791-010-9147-3

Publication

Comparing Tweets and Tags for URLs

Publisher: Springer Berlin Heidelberg

Date: 2012

DOI: 10.1007/978-3-642-28997-2_7

Publication

ALRn

Publisher: Springer Science and Business Media LLC

Date: 22-07-2016

DOI: 10.1007/S10994-016-5574-8

Publication

Self-labeling methods for unsupervised transfer ranking

Publisher: Elsevier BV

Date: 04-2020

DOI: 10.1016/J.INS.2019.12.067

Publication

Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series `Friends'

Publisher: Association for Computational Linguistics

Date: 2016

DOI: 10.18653/V1/K16-1015

Publication

Tag data and personalized information retrieval

Publisher: ACM

Date: 30-10-2008

DOI: 10.1145/1458583.1458591

Publication

A statistical comparison of tag and query logs

Publisher: ACM

Date: 19-07-2009

DOI: 10.1145/1571941.1571965

Publication

Density-ratio based clustering for discovering clusters with varying densities

Publisher: Elsevier BV

Date: 12-2016

DOI: 10.1016/J.PATCOG.2016.07.007

Publication

Efficient parameter learning of Bayesian network classifiers

Publisher: Springer Science and Business Media LLC

Date: 26-01-2017

DOI: 10.1007/S10994-016-5619-Z

Publication

Estimating relative user expertise for content quality prediction on Reddit

Publisher: ACM

Date: 04-07-2017

DOI: 10.1145/3078714.3078720

Publication

A Citizen Science Approach for Analyzing Social Media With Crowdsourcing

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/ACCESS.2023.3243791

Publication

Ranking social bookmarks using topic models

Publisher: ACM

Date: 26-10-2010

DOI: 10.1145/1871437.1871632

Publication

Towards risk-aware resource selection

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-12844-3_13

Publication

Sub-word embeddings for OCR corrections in highly fusional indic languages

Publisher: IEEE

Date: 09-2019

DOI: 10.1109/ICDAR.2019.00034

Publication

OCR on-the-go: Robust end-to-end systems for reading license plates & street signs

Publisher: IEEE

Date: 09-2019

DOI: 10.1109/ICDAR.2019.00033

Publication

Planning for web services the hard way

Publisher: IEEE Comput. Soc

Date: 2003

DOI: 10.1109/SAINTW.2003.1210130

Publication

Proximity-based opinion retrieval

Publisher: ACM

Date: 19-07-2010

DOI: 10.1145/1835449.1835517

Publication

Are Word Embedding-based Features Useful for Sarcasm Detection?

Publisher: Association for Computational Linguistics

Date: 2016

DOI: 10.18653/V1/D16-1104

Publication

A Computational Approach to Automatic Prediction of Drunk-Texting

Publisher: Association for Computational Linguistics

Date: 2015

DOI: 10.3115/V1/P15-2100

Publication

Improving social bookmark search using personalised latent variable language models

Publisher: ACM

Date: 09-02-2011

DOI: 10.1145/1935826.1935898

Publication

Estimating domain-specific user expertise for answer retrieval in community question-answering platforms

Publisher: ACM

Date: 05-12-2016

DOI: 10.1145/3015022.3015032

Publication

Investigating the statistical properties of user-generated documents

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-24764-4_18

Publication

Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter

Publisher: ACM

Date: 06-11-2017

DOI: 10.1145/3132847.3133161

Publication

Towards an economy-based optimisation of file access and replication on a data grid

Publisher: IEEE

Date: 2002

DOI: 10.1109/CCGRID.2002.1017156

Publication

Monte-Carlo filesystem search - A crawl strategy for digital forensics

Publisher: Elsevier BV

Date: 06-2015

DOI: 10.1016/J.DIIN.2015.04.002

Publication

Grouping points by shared subspaces for effective subspace clustering

Publisher: Elsevier BV

Date: 11-2018

DOI: 10.1016/J.PATCOG.2018.05.027

Publication

A Framework for Document Specific Error Detection and Corrections in Indic OCR

Publisher: IEEE

Date: 11-2018

DOI: 10.1109/ICDAR.2017.308

Publication

Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

Publisher: Springer Science and Business Media LLC

Date: 02-07-2019

DOI: 10.1007/S10994-018-5737-X

Publication

Efficient Benchmarking of NLP APIs using Multi-armed Bandits

Publisher: Association for Computational Linguistics

Date: 2017

DOI: 10.18653/V1/E17-1039

Publication

Building user profiles from topic models for personalised search

Publisher: ACM Press

Date: 2013

DOI: 10.1145/2505515.2505642

Publication

Sarcasm generation

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_5

Publication

Conclusion and future work

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_6

Publication

Undersampling techniques to re-balance training data for large scale learning-to-rank

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-12844-3_38

Publication

A Request Language for Web-Services Based on Planning and Constraint Satisfaction

Publisher: Springer Berlin Heidelberg

Date: 2002

DOI: 10.1007/3-540-46121-3_10

Publication

GeMI: interactive interface for transformer-based Genomic Metadata Integration

Publisher: Oxford University Press (OUP)

Date: 2022

DOI: 10.1093/DATABASE/BAAC036

Abstract: The Gene Expression Omnibus (GEO) is a public archive containing & million digital s les from functional genomics experiments collected over almost two decades. The accompanying metadata describing the experiments suffer from redundancy, inconsistency and incompleteness due to the prevalence of free text and the lack of well-defined data formats and their validation. To remedy this situation, we created Genomic Metadata Integration (GeMI gmql.eu/gemi/), a web application that learns to automatically extract structured metadata (in the form of key-value pairs) from the plain text descriptions of GEO experiments. The extracted information can then be indexed for structured search and used for various downstream data mining activities. GeMI works in continuous interaction with its users. The natural language processing transformer-based model at the core of our system is a fine-tuned version of the Generative Pre-trained Transformer 2 (GPT2) model that is able to learn continuously from the feedback of the users thanks to an active learning framework designed for the purpose. As a part of such a framework, a machine learning interpretation mechanism (that exploits saliency maps) allows the users to understand easily and quickly whether the predictions of the model are correct and improves the overall usability. GeMI’s ability to extract attributes not explicitly mentioned (such as sex, tissue type, cell type, ethnicity and disease) allows researchers to perform specific queries and classification of experiments, which was previously possible only after spending time and resources with tedious manual annotation. The usefulness of GeMI is demonstrated on practical research use cases. Database URL gmql.eu/gemi/

Publication

A technical survey on statistical modelling and design methods for crowdsourcing quality control

Publisher: Elsevier BV

Date: 10-2020

DOI: 10.1016/J.ARTINT.2020.103351

Publication

Opinion Retrieval

Publisher: Springer Berlin Heidelberg

Date: 2009

DOI: 10.1007/978-3-642-00958-7_31

Publication

Multi-domain evaluation framework for named entity recognition tools

Publisher: Elsevier BV

Date: 05-2017

DOI: 10.1016/J.CSL.2016.10.003

Publication

Personal blog retrieval using opinion features

Publisher: Springer Berlin Heidelberg

Date: 2011

DOI: 10.1007/978-3-642-20161-5_85

Publication

Leveraging label category relationships in multi-class crowdsourcing

Publisher: Springer International Publishing

Date: 2018

DOI: 10.1007/978-3-319-93037-4_11

Publication

Employing document dependency in blog search

Publisher: Wiley

Date: 31-10-2011

DOI: 10.1002/ASI.21687

Publication

Exploiting data semantics to discover, extract, and model web sources

Publisher: IEEE

Date: 12-2008

DOI: 10.1109/ICDMW.2008.134

Publication

Aggregation methods for proximity-based opinion retrieval

Publisher: Association for Computing Machinery (ACM)

Date: 11-2012

DOI: 10.1145/2382438.2382445

Abstract: The enormous amount of user-generated data available on the Web provides a great opportunity to understand, analyze, and exploit people’s opinions on different topics. Traditional Information Retrieval methods consider the relevance of documents to a topic but are unable to differentiate between subjective and objective documents. Opinion retrieval is a retrieval task in which not only the relevance of a document to the topic is important but also the amount of opinion expressed in the document about the topic. In this article, we address the blog post opinion retrieval task and propose methods that rank blog posts according to their relevance and opinionatedness toward a topic. We propose estimating the opinion density at each position in a document using a general opinion lexicon and kernel density functions. We propose and investigate different models for aggregating the opinion density at query terms positions to estimate the opinion score of every document. We then combine the opinion score with the relevance score based on a probabilistic justification. Experimental results on the BLOG06 dataset show that the proposed method provides significant improvement over the standard TREC baselines. The proposed models also achieve much higher performance compared to all state of the art methods.

Publication

Expect the Unexpected: Harnessing Sentence Completion for Sarcasm Detection

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8438-6_22

Publication

Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank

Publisher: Association for Computing Machinery (ACM)

Date: 17-08-2016

DOI: 10.1145/2866571

Abstract: Current random-forest (RF)-based learning-to-rank (LtR) algorithms use a classification or regression framework to solve the ranking problem in a pointwise manner. The success of this simple yet effective approach coupled with the inherent parallelizability of the learning algorithm makes it a strong candidate for widespread adoption. In this article, we aim to better understand the effectiveness of RF-based rank-learning algorithms with a focus on the comparison between pointwise and listwise approaches. We introduce what we believe to be the first listwise version of an RF-based LtR algorithm. The algorithm directly optimizes an information retrieval metric of choice (in our case, NDCG) in a greedy manner. Direct optimization of the listwise objective functions is computationally prohibitive for most learning algorithms, but possible in RF since each tree maximizes the objective in a coordinate-wise fashion. Computational complexity of the listwise approach is higher than the pointwise counterpart hence for larger datasets, we design a hybrid algorithm that combines a listwise objective in the early stages of tree construction and a pointwise objective in the latter stages. We also study the effect of the discount function of NDCG on the listwise algorithm. Experimental results on several publicly available LtR datasets reveal that the listwise/hybrid algorithm outperforms the pointwise approach on the majority (but not all) of the datasets. We then investigate several aspects of the two algorithms to better understand the inevitable performance tradeoffs. The aspects include examining an RF-based unsupervised LtR algorithm and comparing in idual tree strength. Finally, we compare the the investigated RF-based algorithms with several other LtR algorithms.

Publication

Sarcasm detection using incongruity within target text

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_3

Publication

Towards personalized distributed information retrieval

Publisher: ACM

Date: 20-07-2008

DOI: 10.1145/1390334.1390468

Publication

Sarcasm detection using contextual incongruity

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_4

Publication

Criminal motivation on the dark web

Publisher: Elsevier BV

Date: 03-2018

DOI: 10.1016/J.DIIN.2017.12.003

Publication

SIR-Hawkes

Publisher: ACM Press

Date: 2018

DOI: 10.1145/3178876.3186108

Publication

Introduction

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_1

Publication

Blog distillation using random walks

Publisher: ACM

Date: 19-07-2009

DOI: 10.1145/1571941.1572054

Publication

Understanding the phenomenon of sarcasm

Publisher: Springer Singapore

Date: 2018

DOI: 10.1007/978-981-10-8396-9_2

Publication

Laying foundations for effective machine learning in law enforcement. Majura – A labelling schema for child exploitation materials

Publisher: Elsevier BV

Date: 09-2018

DOI: 10.1016/J.DIIN.2018.05.004

Publication

CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

Publisher: Elsevier BV

Date: 09-2021

DOI: 10.1016/J.PATCOG.2021.107977

Publication

Learning Semantic Definitions of Online Information Sources

Publisher: AI Access Foundation

Date: 11-09-2007

DOI: 10.1613/JAIR.2205

Abstract: The Internet contains a very large number of information sources providing many types of data from weather forecasts to travel deals and financial information. These sources can be accessed via Web-forms, Web Services, RSS feeds and so on. In order to make automated use of these sources, we need to model them semantically, but writing semantic descriptions for Web Services is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions of Web sources. In order to learn these definitions, our system actively invokes the sources and compares the data they produce with that of known sources of information. It then performs an inductive logic search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. In this paper we perform an empirical evaluation of the system using real-world Web sources. The evaluation demonstrates the effectiveness of the approach, showing that we can automatically learn complex models for real sources in reasonable time. We also compare our system with a complex schema matching system, showing that our approach can handle the kinds of problems tackled by the latter.

Publication

Beyond Clustering

Publisher: ACM

Date: 24-10-2016

DOI: 10.1145/2983323.2983810

Publication

On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections

Publisher: ACM

Date: 24-10-2016

DOI: 10.1145/2983323.2983852

Publication

A topic-Based measure of resource description quality for distributed information retrieval

Publisher: Springer Berlin Heidelberg

Date: 2009

DOI: 10.1007/978-3-642-00958-7_45

Publication

Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression

Publisher: IEEE

Date: 12-2014

DOI: 10.1109/ICDM.2014.53

Publication

Leveraging machine learning to predict rail corrugation level from axle-box acceleration measurements on commercial vehicles

Publisher: Informa UK Limited

Date: 07-06-2023

DOI: 10.1080/23248378.2023.2220112

Publication

Annotator Expertise and Information Quality in Annotation-based Retrieval

Publisher: ACM

Date: 07-12-2017

DOI: 10.1145/3166072.3166075

Publication

Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure

Publisher: ACM

Date: 13-08-2016

DOI: 10.1145/2939672.2939779

Publication

Bayesian latent variable models for collaborative item rating prediction

Publisher: ACM

Date: 24-10-2011

DOI: 10.1145/2063576.2063680

Publication

Automatic Sarcasm Detection

Publisher: Association for Computing Machinery (ACM)

Date: 26-09-2017

DOI: 10.1145/3124420

Abstract: Automatic sarcasm detection is the task of predicting sarcasm in text. This is a crucial step to sentiment analysis, considering prevalence and challenges of sarcasm in sentiment-bearing text. Beginning with an approach that used speech-based features, automatic sarcasm detection has witnessed great interest from the sentiment analysis community. This article is a compilation of past work in automatic sarcasm detection. We observe three milestones in the research so far: semi-supervised pattern extraction to identify implicit sentiment, use of hashtag-based supervision, and incorporation of context beyond target text. In this article, we describe datasets, approaches, trends, and issues in sarcasm detection. We also discuss representative performance values, describe shared tasks, and provide pointers to future work, as given in prior works. In terms of resources to understand the state-of-the-art, the survey presents several useful illustrations—most prominently, a table that summarizes past papers along different dimensions such as the types of features, annotation techniques, and datasets used.

Publication

Investigating Deep Learning Based Breast Cancer Subtyping Using Pan-Cancer and Multi-Omic Data

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2022

DOI: 10.1109/TCBB.2020.3042309

Mark Carman

Researcher

Related Links

Publications

Towards query log based personalization using topic models

Error Detection and Corrections in Indic OCR Using LSTMs

Tag navigation

A multi-collection latent topic model for federated search

Comparing Tweets and Tags for URLs

ALRn

Self-labeling methods for unsupervised transfer ranking

Harnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series `Friends'

Tag data and personalized information retrieval

A statistical comparison of tag and query logs

Density-ratio based clustering for discovering clusters with varying densities

Efficient parameter learning of Bayesian network classifiers

Estimating relative user expertise for content quality prediction on Reddit

A Citizen Science Approach for Analyzing Social Media With Crowdsourcing

Ranking social bookmarks using topic models

Towards risk-aware resource selection

Sub-word embeddings for OCR corrections in highly fusional indic languages

OCR on-the-go: Robust end-to-end systems for reading license plates & street signs

Planning for web services the hard way

Proximity-based opinion retrieval

Are Word Embedding-based Features Useful for Sarcasm Detection?

A Computational Approach to Automatic Prediction of Drunk-Texting

Improving social bookmark search using personalised latent variable language models

Estimating domain-specific user expertise for answer retrieval in community question-answering platforms

Investigating the statistical properties of user-generated documents

Using Knowledge Graphs to Explain Entity Co-occurrence in Twitter

Towards an economy-based optimisation of file access and replication on a data grid

Monte-Carlo filesystem search - A crawl strategy for digital forensics

Grouping points by shared subspaces for effective subspace clustering

A Framework for Document Specific Error Detection and Corrections in Indic OCR

Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

Efficient Benchmarking of NLP APIs using Multi-armed Bandits

Building user profiles from topic models for personalised search

Sarcasm generation

Conclusion and future work

Undersampling techniques to re-balance training data for large scale learning-to-rank

A Request Language for Web-Services Based on Planning and Constraint Satisfaction

GeMI: interactive interface for transformer-based Genomic Metadata Integration

A technical survey on statistical modelling and design methods for crowdsourcing quality control

Opinion Retrieval

Multi-domain evaluation framework for named entity recognition tools

Personal blog retrieval using opinion features

Leveraging label category relationships in multi-class crowdsourcing

Employing document dependency in blog search

Exploiting data semantics to discover, extract, and model web sources

Aggregation methods for proximity-based opinion retrieval

Expect the Unexpected: Harnessing Sentence Completion for Sarcasm Detection

Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank

Sarcasm detection using incongruity within target text

Towards personalized distributed information retrieval

Sarcasm detection using contextual incongruity

Criminal motivation on the dark web

SIR-Hawkes

Introduction

Blog distillation using random walks

Understanding the phenomenon of sarcasm

Laying foundations for effective machine learning in law enforcement. Majura – A labelling schema for child exploitation materials

CDF Transform-and-Shift: An effective way to deal with datasets of inhomogeneous cluster densities

Learning Semantic Definitions of Online Information Sources

Beyond Clustering

On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections

A topic-Based measure of resource description quality for distributed information retrieval

Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression

Leveraging machine learning to predict rail corrugation level from axle-box acceleration measurements on commercial vehicles

Annotator Expertise and Information Quality in Annotation-based Retrieval

Overcoming Key Weaknesses of Distance-based Neighbourhood Methods using a Data Dependent Dissimilarity Measure

Bayesian latent variable models for collaborative item rating prediction

Automatic Sarcasm Detection

Investigating Deep Learning Based Breast Cancer Subtyping Using Pan-Cancer and Multi-Omic Data

Related Organisations

Politecnico Di Milano

Monash University - Caulfield Campus

Università Degli Studi Di Trento

University Of Adelaide

Related Funding Activities

ARDC NEWSLETTER SIGNUP