ORCID Profile
0000-0002-7311-3693
Current Organisation
University of Queensland
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Information Systems | Database Management | Conceptual Modelling | Research, Science and Technology Policy | Business Information Systems | Computer-Human Interaction |
Information Services not elsewhere classified | Information Processing Services (incl. Data Entry and Capture) | Application Tools and System Utilities | Electronic Information Storage and Retrieval Services | Expanding Knowledge in Technology | Technological and Organisational Innovation
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2019
Publisher: ACM
Date: 18-07-2018
Publisher: Elsevier BV
Date: 12-2022
Publisher: Association for Computing Machinery (ACM)
Date: 25-06-2009
Abstract: INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2008 evaluation c aign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining.
Publisher: ACM
Date: 09-08-2023
Publisher: ACM
Date: 07-04-2014
Publisher: ACM
Date: 25-07-2020
Publisher: Springer International Publishing
Date: 2022
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Association for Computing Machinery (ACM)
Date: 24-05-2011
Abstract: The exponential growth of digital information available in Enterprises and on the Web creates the need for search tools that can respond to the most sophisticated informational needs. Retrieving relevant documents is not enough anymore and finding entities rather than textual resources provides great support to the user both on the Web and in Enterprises. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web pages. For ex le, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic. Aggregation of information spread over different documents is a key aspect in this process. Finding experts is a problem mostly considered in the Enterprise setting where teams for new projects need to be built and problems need to be solved by the right persons. In the first part of the thesis, we propose a model for expert finding based on the well consolidated vector space model for Information Retrieval and investigate its effectiveness. We can define Entity Retrieval by generalizing the expert finding problem to any entity. In Entity Retrieval the goal is to rank entities according to their relevance to a query (e.g., "Countries where I can pay in Euro") the set of entities to be ranked is assumed to be loosely defined by a generic category, given in the query itself (e.g., countries), or by some ex le entities (e.g., Italy, Germany, France). In the second part of the thesis, we investigate different methods based on Semantic Web and Natural Language Processing techniques for solving these tasks both in Wikipedia and, generally, on the Web. Evaluation is a critical aspect of Information Retrieval. We contributed to the field of Information Retrieval evaluation by organizing an evaluation initiative for Entity Retrieval. Opinions and other relevant information about entities can be provided by different sources in different contexts. News articles report about events where entities are involved. In such setting the temporal dimension is critical as news stories develop over time and new entities appear in the story and others are not relevant anymore. In the third part of this thesis, we study the problem of Entity Retrieval for news applications and the importance of the news trail history (i.e., past related articles) to determine the relevant entities in current articles. We also study opinion evolution about entities. In the last years, the blogosphere has become a vital part of the Web, covering a variety of different points of view and opinions on political and event-related topics such as immigration, election c aigns, or economic developments. We propose a method for automatically extracting public opinion about specific entities from the blogosphere. Available online at esearch hd/.
Publisher: ACM
Date: 18-07-2019
Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
Date: 02-06-2023
DOI: 10.1609/ICWSM.V17I1.22221
Abstract: Social media is an important source of real-time imagery concerning world events. One subset of social media posts which may be of particular interest are those featuring firearms. These posts can give insight into weapon movements, troop activity and civilian safety. Object detection tools offer important opportunities for insight into these images. Unfortunately, these images can be visually complex, poorly lit and generally challenging for object detection models. We present an analysis of existing gun detection datasets, and find that these datasets to not effectively address the challenge of gun detection on real-life images. Following this, we present a novel object detection pipeline. We train our pipeline on a number of datasets including one created for this investigation made up of Twitter images of the Russo-Ukrainian War. We compare the performance of our model as trained on the different datasets to baseline numbers provided by original authors as well as a YOLO v5 benchmark. We find that our model outperforms the state-of-the-art benchmarks on contextually rich, real-life-derived imagery of firearms.
Publisher: ACM
Date: 06-11-2017
Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
Date: 31-05-2022
DOI: 10.1609/ICWSM.V16I1.19395
Abstract: During global health crises, the use of data becomes critical to control the spread of infections, to inform the general public and to foster safe behaviors. The ability of people to read and understand data (i.e., data literacy) has the potential to affect human behaviors. In this paper, we study non-expert human subjects' ability to make accurate interpretations of complex pandemic data visualizations designed for general public consumption. We present them with popular plots and graphs that have been shown by traditional and social media, and ask them to answer questions to assess their data literacy at three levels: extracting information, finding relationships among data, and expanding or predicting information. Our results show the presence of variance in interpretations and reveal insights into how messages communicated through data may be perceived differently by different people. We also highlight the importance of designing communication strategies that ensure the spread of the right message through data.
Publisher: Linnaeus University Press
Date: 03-05-2022
Abstract: Infectious disease outbreaks are a serious public health threat which can disrupt world economies. This paper presents an in-depth qualitative analysis of n=15,415 tweets that relate to the peak of three major infectious diseases: the swine flu outbreak of 2009, the Ebola outbreak of 2014, and the Zika outbreak of 2016. Tweets were analysed using thematic analysis and a number of themes and sub-themes were identified. The results were brought together in an abstraction phase and the commonalities between the cases were studied. A notable similarity which emerged was the rate at which Twitter users expressed intense fear and panic akin to that of the phenomena of “moral panic” and the “outbreak narrative”. Our study also discusses the utility of using Twitter data for in-depth qualitative research as compared to traditional interview-methods. Our study is the largest in-depth analysis of tweets on infectious diseases and could inform public health strategies for future outbreaks such as the coronavirus outbreak.
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2015
DOI: 10.1109/MIS.2015.66
Publisher: Springer Berlin Heidelberg
Date: 2012
Publisher: ACM
Date: 09-12-2021
Publisher: Springer International Publishing
Date: 2022
Publisher: Springer International Publishing
Date: 2020
Publisher: ACM
Date: 25-04-2022
Publisher: Association for Computing Machinery (ACM)
Date: 18-08-2023
DOI: 10.1145/3597201
Abstract: To scale the size of Information Retrieval collections, crowdsourcing has become a common way to collect relevance judgments at scale. Crowdsourcing experiments usually employ 100–10,000 workers, but such a number is often decided in a heuristic way. The downside is that the resulting dataset does not have any guarantee of meeting predefined statistical requirements as, for ex le, have enough statistical power to be able to distinguish in a statistically significant way between the relevance of two documents. We propose a methodology adapted from literature on sound topic set size design, based on t-test and ANOVA, which aims at guaranteeing the resulting dataset to meet a predefined set of statistical requirements. We validate our approach on several public datasets. Our results show that we can reliably estimate the recommended number of workers needed to achieve statistical power, and that such estimation is dependent on the topic, while the effect of the relevance scale is limited. Furthermore, we found that such estimation is dependent on worker features such as agreement. Finally, we describe a set of practical estimation strategies that can be used to estimate the worker set size, and we also provide results on the estimation of document set sizes.
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Now Publishers
Date: 2017
DOI: 10.1561/1800000025
Publisher: ACM
Date: 09-12-2021
Publisher: Wiley
Date: 31-08-2022
DOI: 10.1111/JCAL.12729
Abstract: The use of crowdsourcing in a pedagogically supported form to partner with learners in developing novel content is emerging as a viable approach for engaging students in higher‐order learning at scale. However, how students behave in this form of crowdsourcing, referred to as learnersourcing, is still insufficiently explored. To contribute to filling this gap, this study explores how students engage with learnersourcing tasks across a range of course and assessment designs. We conducted an exploratory study on trace data of 1279 students across three courses, originating from the use of a learnersourcing environment under different assessment designs. We employed a new methodology from the learning analytics (LA) field that aims to represent students' behaviour through two theoretically‐derived latent constructs: learning tactics and the learning strategies built upon them. The study's results demonstrate students use different tactics and strategies, highlight the association of learnersourcing contexts with the identified learning tactics and strategies, indicate a significant association between the strategies and performance and contribute to the employed method's generalisability by applying it to a new context. This study provides an ex le of how learning analytics methods can be employed towards the development of effective learnersourcing systems and, more broadly, technological educational solutions that support learner‐centred and data‐driven learning at scale. Findings should inform best practices for integrating learnersourcing activities into course design and shed light on the relevance of tactics and strategies to support teachers in making informed pedagogical decisions.
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Springer Science and Business Media LLC
Date: 18-05-2010
Publisher: ACM
Date: 26-06-2022
Publisher: Elsevier BV
Date: 03-2016
Publisher: ACM
Date: 12-08-2012
Publisher: ACM
Date: 27-06-2018
Publisher: Springer International Publishing
Date: 2020
Publisher: ACM
Date: 25-04-2022
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Zenodo
Date: 2022
Publisher: International World Wide Web Conferences Steering Committee
Date: 11-04-2016
Publisher: ACM
Date: 24-11-2008
Publisher: Association for Computing Machinery (ACM)
Date: 07-02-2023
DOI: 10.1145/3567419
Abstract: Understanding how data workers interact with data, and various pieces of information related to data preparation, is key to designing systems that can better support them in exploring datasets. To date, however, there is a paucity of research studying the strategies adopted by data workers as they carry out data preparation activities. In this work, we investigate a specific data preparation activity, namely data quality discovery , and aim to (i) understand the behaviors of data workers in discovering data quality issues, (ii) explore what factors (e.g., prior experience) can affect their behaviors, as well as (iii) understand how these behavioral observations relate to their performance. To this end, we collect a multi-modal dataset through a data-driven experiment that relies on the use of eye-tracking technology with a purpose-designed platform built on top of iPython Notebook. The experiment results reveal that: (i) ‘copy–paste–modify’ is a typical strategy for writing code to complete tasks (ii) proficiency in writing code has a significant impact on the quality of task performance, while perceived difficulty and efficacy can influence task completion patterns and (iii) searching in external resources is a prevalent action that can be leveraged to achieve better performance. Furthermore, our experiment indicates that providing s le code within the system can help data workers get started with their task, and surfacing underlying data is an effective way to support exploration. By investigating data worker behaviors prior to each search action, we also find that the most common reasons that trigger external search actions are the need to seek assistance in writing or debugging code and to search for relevant code to reuse. Based on our experiment results, we showcase a systematic approach to select from the top best code snippets created by data workers and assemble them to achieve better performance than the best in idual performer in the dataset. By doing so, our findings not only provide insights into patterns of interactions with various system components and information resources when performing data curation tasks, but also build effective and efficient data curation processes through data workers’ collective intelligence.
Publisher: ACM
Date: 12-04-2021
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Springer International Publishing
Date: 2019
Publisher: Springer International Publishing
Date: 2020
Publisher: Association for Computing Machinery (ACM)
Date: 13-10-2021
DOI: 10.1145/3479531
Abstract: Crowdsourcing is being increasingly adopted as a platform to run studies with human subjects. Running a crowdsourcing experiment involves several choices and strategies to successfully port an experimental design into an otherwise uncontrolled research environment, e.g., s ling crowd workers, mapping experimental conditions to micro-tasks, or ensure quality contributions. While several guidelines inform researchers in these choices, guidance of how and what to report from crowdsourcing experiments has been largely overlooked. If under-reported, implementation choices constitute variability sources that can affect the experiment's reproducibility and prevent a fair assessment of research outcomes. In this paper, we examine the current state of reporting of crowdsourcing experiments and offer guidance to address associated reporting issues. We start by identifying sensible implementation choices, relying on existing literature and interviews with experts, to then extensively analyze the reporting of 171 crowdsourcing experiments. Informed by this process, we propose a checklist for reporting crowdsourcing experiments.
Publisher: International World Wide Web Conferences Steering Committee
Date: 18-05-2015
Publisher: IEEE
Date: 10-2008
Publisher: Springer International Publishing
Date: 2018
Publisher: Springer Science and Business Media LLC
Date: 08-09-2015
Publisher: Springer Science and Business Media LLC
Date: 26-06-2019
Publisher: Springer Berlin Heidelberg
Date: 2012
Publisher: International Joint Conferences on Artificial Intelligence Organization
Date: 08-2019
Abstract: An important precondition to build effective AI models is the collection of training data at scale. Crowdsourcing is a popular methodology to achieve this goal. Its adoption introduces novel challenges in data quality control, to deal with under-performing and malicious annotators. One of the most popular quality assurance mechanisms, especially in paid micro-task crowdsourcing, is the use of a small set of pre-annotated tasks as gold standard, to assess in real time the annotators quality. In this paper, we highlight a set of vulnerabilities this scheme suffers: a group of colluding crowd workers can easily implement and deploy a decentralised machine learning inferential system to detect and signal which parts of the task are more likely to be gold questions, making them ineffective as a quality control tool. Moreover, we demonstrate how the most common countermeasures against this attack are ineffective in practical scenarios. The basic architecture of the inferential system is composed of a browser plug-in and an external server where the colluding workers can share information. We implement and validate the attack scheme, by means of experiments on real-world data from a popular crowdsourcing platform.
Publisher: ACM
Date: 27-06-2018
Publisher: Elsevier BV
Date: 08-2014
Publisher: ACM
Date: 21-03-2022
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: IEEE
Date: 18-07-2021
Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
Date: 04-10-2021
Abstract: Automatic predictions (e.g., recognizing objects in images) may result in systematic errors if certain classes are not well represented by training instances (these errors are called unknowns). When a model assigns high confidence scores to these wrong predictions (this type of error is called unknown unknowns), it becomes challenging to automatically identify them. In this paper, we present the first work on leveraging human intelligence to discover unknown unknowns (UUs) in an iterative way. The proposed methodology first differentiates the feature space generated by crowd workers labelling instances (e.g., images) in an active learning fashion from the space learned by the prediction model over a batch training phase, and thus identifies the predictions most likely to be UUs. Next, we add crowd labels collected for these discovered UUs to the training set and re-train the model with this extended dataset. This process is then repeated iteratively to discover more instances of both unknown and under-represented classes. Our experimental results show that the proposed methodology is able to (1) efficiently discover UUs, (2) significantly improve the quality of model predictions, and (3) to push UUs into known unknowns (i.e., the model makes mistakes but at least its classification confidence on those instances is low so those predictions can be discarded or post-processed) for further investigation. We additionally discuss the trade-off between prediction quality improvements and the human effort required to achieve those improvements. Our results bear implications on building cost-effective systems to discover UUs with humans in the loop.
Publisher: Elsevier BV
Date: 09-2023
Publisher: Springer Science and Business Media LLC
Date: 16-09-2021
DOI: 10.1007/S00779-021-01604-6
Abstract: Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to COVID-19, thus addressing (mis)information that is both related to a sensitive and personal issue and very recent as compared to when the judgment is done. In our experiments, crowd workers are asked to assess the truthfulness of statements, and to provide evidence for the assessments. Besides showing that the crowd is able to accurately judge the truthfulness of the statements, we report results on workers’ behavior, agreement among workers, effect of aggregation functions, of scales transformations, and of workers background and bias. We perform a longitudinal study by re-launching the task multiple times with both novice and experienced workers, deriving important insights on how the behavior and quality change over time. Our results show that workers are able to detect and objectively categorize online (mis)information related to COVID-19 both crowdsourced and expert judgments can be transformed and aggregated to improve quality worker background and other signals (e.g., source of information, behavior) impact the quality of the data. The longitudinal study demonstrates that the time-span has a major effect on the quality of the judgments, for both novice and experienced workers. Finally, we provide an extensive failure analysis of the statements misjudged by the crowd-workers.
Publisher: ACM
Date: 18-07-2023
Publisher: ACM
Date: 13-05-2019
Publisher: ACM
Date: 13-06-2011
Publisher: Association for Computing Machinery (ACM)
Date: 28-12-2022
DOI: 10.1145/3546916
Abstract: Automatically detecting online misinformation at scale is a challenging and interdisciplinary problem. Deciding what is to be considered truthful information is sometimes controversial and also difficult for educated experts. As the scale of the problem increases, human-in-the-loop approaches to truthfulness that combine both the scalability of machine learning (ML) and the accuracy of human contributions have been considered. In this work, we look at the potential to automatically combine machine-based systems with human-based systems. The former exploit superviseds ML approaches the latter involve either crowd workers (i.e., human non-experts) or human experts. Since both ML and crowdsourcing approaches can produce a score indicating the level of confidence on their truthfulness judgments (either algorithmic or self-reported, respectively), we address the question of whether it is feasible to make use of such confidence scores to effectively and efficiently combine three approaches: (i) machine-based methods, (ii) crowd workers, and (iii) human experts. The three approaches differ significantly, as they range from available, cheap, fast, scalable, but less accurate to scarce, expensive, slow, not scalable, but highly accurate.
Publisher: Association for Computing Machinery (ACM)
Date: 22-06-2023
DOI: 10.1145/3588431
Publisher: ACM
Date: 17-10-2022
Publisher: Springer Science and Business Media LLC
Date: 15-12-2018
Publisher: Association for Computing Machinery (ACM)
Date: 28-12-2023
DOI: 10.1145/3546917
Abstract: Automated fact-checking (AFC) systems exist to combat disinformation, however, their complexity usually makes them opaque to the end-user, making it difficult to foster trust in the system. In this article, we introduce the E-BART model with the hope of making progress on this front. E-BART is able to provide a veracity prediction for a claim and jointly generate a human-readable explanation for this decision. We show that E-BART is competitive with the state-of-the-art on the e-FEVER and e-SNLI tasks. In addition, we validate the joint-prediction architecture by showing (1) that generating explanations does not significantly impede the model from performing well in its main task of veracity prediction, and (2) that predicted veracity and explanations are more internally coherent when generated jointly than separately. We also calibrate the E-BART model, allowing the output of the final model to be correctly interpreted as the confidence of correctness. Finally, we also conduct an extensive human evaluation on the impact of generated explanations and observe that: Explanations increase human ability to spot misinformation and make people more skeptical about claims, and explanations generated by E-BART are competitive with ground truth explanations.
Publisher: Elsevier BV
Date: 10-2015
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: ACM
Date: 30-04-2023
Publisher: Springer International Publishing
Date: 2021
Publisher: Elsevier BV
Date: 03-2010
Publisher: ACM
Date: 17-10-2018
Publisher: ACM
Date: 30-04-2023
Publisher: Association for Computing Machinery (ACM)
Date: 11-09-2017
DOI: 10.1145/3130914
Abstract: The ubiquity of the Internet and the widespread proliferation of electronic devices has resulted in flourishing microtask crowdsourcing marketplaces, such as Amazon MTurk. An aspect that has remained largely invisible in microtask crowdsourcing is that of work environments defined as the hardware and software affordances at the disposal of crowd workers which are used to complete microtasks on crowdsourcing platforms. In this paper, we reveal the significant role of work environments in the shaping of crowd work. First, through a pilot study surveying the good and bad experiences workers had with UI elements in crowd work, we revealed the typical issues workers face. Based on these findings, we then deployed over 100 distinct microtasks on CrowdFlower, addressing workers in India and USA in two identical batches. These tasks emulate the good and bad UI element designs that characterize crowdsourcing microtasks. We recorded hardware specifics such as CPU speed and device type, apart from software specifics including the browsers used to complete tasks, operating systems on the device, and other properties that define the work environments of crowd workers. Our findings indicate that crowd workers are embedded in a variety of work environments which influence the quality of work produced. To confirm and validate our data-driven findings we then carried out semi-structured interviews with a s le of Indian and American crowd workers from this platform. Depending on the design of UI elements in microtasks, we found that some work environments support crowd workers more than others. Based on our overall findings resulting from all the three studies, we introduce ModOp, a tool that helps to design crowdsourcing microtasks that are suitable for erse crowd work environments. We empirically show that the use of ModOp results in reducing the cognitive load of workers, thereby improving their user experience without affecting the accuracy or task completion time.
Publisher: ACM
Date: 03-11-2019
Publisher: Elsevier BV
Date: 11-2021
Publisher: ACM
Date: 20-01-2020
Publisher: Emerald Publishing Limited
Date: 06-12-2017
Publisher: Springer International Publishing
Date: 2016
Publisher: ACM
Date: 03-11-2019
Publisher: Springer International Publishing
Date: 2019
Publisher: Wiley
Date: 20-01-2019
DOI: 10.1111/HIR.12247
Abstract: Infectious disease outbreaks have the potential to cause a high number of fatalities and are a very serious public health risk. Our aim was to utilise an indepth method to study a period of time where the H1N1 Pandemic of 2009 was at its peak. A data set of n = 214 784 tweets was retrieved and filtered, and the method of thematic analysis was used to analyse the data. Eight key themes emerged from the analysis of data: emotion and feeling, health related information, general commentary and resources, media and health organisations, politics, country of origin, food, and humour and/or sarcasm. A major novel finding was that due to the name 'swine flu', Twitter users had the belief that pigs and pork could host and/or transmit the virus. Our paper also considered the methodological implications for the wider field of library and information science as well as specific implications for health information and library workers. Novel insights were derived on how users communicate about disease outbreaks on social media platforms. Our study also provides an innovative methodological contribution because it was found that by utilising an indepth method it was possible to extract greater insight into user communication.
Publisher: Association for Computing Machinery (ACM)
Date: 21-02-2019
DOI: 10.1145/3301003
Abstract: Crowdsourcing has become an integral part of many systems and services that deliver high-quality results for complex tasks such as data linkage, schema matching, and content annotation. A standard function of such crowd-powered systems is to publish a batch of tasks on a crowdsourcing platform automatically and to collect the results once the workers complete them. Currently, these systems provide limited guarantees over the execution time, which is problematic for many applications. Timely completion may even be impossible to guarantee due to factors specific to the crowdsourcing platform, such as the availability of workers and concurrent tasks. In our previous work, we presented the architecture of a crowd-powered system that reshapes the interaction mechanism with the crowd. Specifically, we studied a push-crowdsourcing model whereby the workers receive tasks instead of selecting them from a portal. Based on this interaction model, we employed scheduling techniques similar to those found in distributed computing infrastructures to automate the task assignment process. In this work, we first devise a generic scheduling strategy that supports both fairness and deadline-awareness. Second, to complement the proof-of-concept experiments previously performed with the crowd, we present an extensive set of simulations meant to analyze the properties of the proposed scheduling algorithms in an environment with thousands of workers and tasks. Our experimental results show that, by accounting for human factors, micro-task scheduling can achieve fairness for best-effort batches and boosts production batches.
Publisher: ACM
Date: 12-09-2019
Publisher: ACM
Date: 19-07-2010
Publisher: ACM
Date: 29-03-2020
Publisher: Springer Science and Business Media LLC
Date: 16-01-2022
DOI: 10.1007/S00778-021-00720-2
Abstract: The appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the erse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.
Publisher: ACM
Date: 13-05-2019
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: ACM
Date: 19-07-2010
Publisher: Elsevier BV
Date: 03-2023
Publisher: ACM
Date: 25-07-2020
Publisher: Springer Berlin Heidelberg
Date: 2006
DOI: 10.1007/11735106_48
Publisher: Springer International Publishing
Date: 2020
Publisher: ACM
Date: 17-10-2022
Publisher: AI Access Foundation
Date: 03-03-2020
DOI: 10.1613/JAIR.1.11332
Abstract: Crowdsourcing is a popular methodology to collect manual labels at scale. Such labels are often used to train AI models and, thus, quality control is a key aspect in the process. One of the most popular quality assurance mechanisms in paid micro-task crowdsourcing is based on gold questions: the use of a small set of tasks of which the requester knows the correct answer and, thus, is able to directly assess crowd work quality. In this paper, we show that such mechanism is prone to an attack carried out by a group of colluding crowd workers that is easy to implement and deploy: the inherent size limit of the gold set can be exploited by building an inferential system to detect which parts of the job are more likely to be gold questions. The described attack is robust to various forms of randomisation and programmatic generation of gold questions. We present the architecture of the proposed system, composed of a browser plug-in and an external server used to share information, and briefly introduce its potential evolution to a decentralised implementation. We implement and experimentally validate the gold detection system, using real-world data from a popular crowdsourcing platform. Our experimental results show that crowdworkers using the proposed system spend more time on signalled gold questions but do not neglect the others thus achieving an increased overall work quality. Finally, we discuss the economic and sociological implications of this kind of attack.
Publisher: ACM
Date: 10-2017
Publisher: Association for Computing Machinery (ACM)
Date: 18-08-2010
Abstract: INEX investigates focused retrieval from structured documents by providing large test collections of structured documents, uniform evaluation measures, and a forum for organizations to compare their results. This paper reports on the INEX 2009 evaluation c aign, which consisted of a wide range of tracks: Ad hoc, Book, Efficiency, Entity Ranking, Interactive, QA, Link the Wiki, and XML Mining. INEX in running entirely on volunteer effort by the IR research community: anyone with an idea and some time to spend, can have a major impact.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: IEEE
Date: 08-2013
Publisher: Association for Computing Machinery (ACM)
Date: 14-10-2020
DOI: 10.1145/3415203
Abstract: Paid micro-task crowdsourcing has gained in popularity partly due to the increasing need for large-scale manually labelled datasets which are often used to train and evaluate Artificial Intelligence systems. Modern paid crowdsourcing platforms use a piecework approach to rewards, meaning that workers are paid for each task they complete, given that their work quality is considered sufficient by the requester or the platform. Such an approach creates risks for workers their work may be rejected without being rewarded, and they may be working on poorly rewarded tasks, in light of the disproportionate time required to complete them. As a result, recent research has shown that crowd workers may tend to choose specific, simple, and familiar tasks and avoid new requesters to manage these risks. In this paper, we propose a novel crowdsourcing reward mechanism that allows workers to share these risks and achieve a standardized hourly wage equal for all participating workers. Reward-focused workers can thereby take up challenging and complex HITs without bearing the financial risk of not being rewarded for completed work. We experimentally compare different crowd reward schemes and observe their impact on worker performance and satisfaction. Our results show that 1) workers clearly perceive the benefits of the proposed reward scheme, 2) work effectiveness and efficiency are not impacted as compared to those of the piecework scheme, and 3) the presence of slow workers is limited and does not disrupt the proposed cooperation-based approaches.
Publisher: ACM
Date: 19-10-2020
Publisher: ACM
Date: 18-04-2015
Publisher: ACM
Date: 26-04-2010
Publisher: ACM
Date: 27-06-2018
Publisher: Springer International Publishing
Date: 2016
Publisher: ACM
Date: 09-11-2007
Publisher: Springer Science and Business Media LLC
Date: 18-07-2013
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2021
Publisher: IEEE
Date: 11-2009
Location: United Kingdom of Great Britain and Northern Ireland
Start Date: 2019
End Date: 2021
Funder: Australian Research Council
View Funded ActivityStart Date: 2017
End Date: 2019
Funder: European Commission
View Funded ActivityStart Date: 2016
End Date: 2017
Funder: Engineering and Physical Sciences Research Council
View Funded ActivityStart Date: 01-2019
End Date: 12-2023
Amount: $440,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 07-2021
End Date: 07-2026
Amount: $4,883,406.00
Funder: Australian Research Council
View Funded Activity