ORCID Profile
0000-0001-6713-7667
Current Organisations
CSIRO
,
Macquarie University
,
Australian National University
,
Data61
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Springer International Publishing
Date: 2019
Publisher: Springer International Publishing
Date: 2020
Publisher: Wiley
Date: 25-04-2022
DOI: 10.1111/BJET.13223
Abstract: With Big Data revolution, the education sector is being reshaped. The current data‐driven education system provides many opportunities to utilize the enormous amount of collected data about students' activities and performance for personalized education, adapting teaching methods, and decision making. On the other hand, such benefits come at a cost to privacy. For ex le, the identification of a student's poor performance across multiple courses. While several works have been conducted on quantifying the re‐identification risks of in iduals in released datasets, they assume an adversary's prior knowledge about target in iduals. Most of them do not utilize all the available information in the datasets. For ex le, event‐level information that associates multiple records to the same in idual and correlation between attributes. In this work, we propose a method using a Markov Model (MM) to quantify re‐identification risks using all available information in the data under a more realistic threat model that assumes different levels of an adversary's knowledge about the target in idual, ranging from any one of the attributes to all given attributes. Moreover, we propose a workflow for efficiently calculating MM risk which is highly scalable to large number of attributes. Experimental results from real education datasets show the efficacy of our model for re‐identification risk. What is already known about this topic? There has been a number of works/research conducted on privacy risk quantification in datasets and in the Web. Majority of them have strong assumption about adversary's prior knowledge of target in idual(s). Most of them do not utilize all the available information in the datasets, eg, event‐level or duplicate records and correlation between attributes. What this paper adds? This paper proposes a new re‐identification risk quantification model using Markov models. Our model addresses the shortcomings of existing works, eg, strong assumption about adversary's knowledge, unexplainable model, and utilizing available information in the datasets. Specifically, our proposed model not only focuses on the uniqueness of data points in the datasets (as most of the other existing methods), but also takes into account uniformity and correlation characteristics of these data points. Re‐identification risk quantification is computationally expensive and is not scalable to large datasets with increasing number of attributes. This paper introduces a workflow for data custodians to use to efficiently evaluate the worst‐case re‐identification risk in their datasets before releasing. It presents extensive experimental evaluation results of the proposed model for quantifying re‐identification risks on several real education datasets. Implications for practice and/or policy? Empirical results on real education datasets validate the significance and efficacy of the proposed model for re‐identification risk quantification compared to existing approaches. Our model can be used by the data custodians as a tool to evaluate the worst‐case risk of a dataset. It empowers data custodians to make informed decisions on appropriate actions to mitigate these risks (eg, data perturbation) before sharing or releasing their datasets to third parties. A typical use case would be one where the data custodians is an online course rogram provider, which collects data about students' engagement with their courses and would like to share it with third parties for them to run learning analytics that would provide value‐added benefits back to the data custodian. We specifically study the privacy risk quantification for education data however, our model is applicable to any tabular data release.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2019
Publisher: Springer International Publishing
Date: 2020
Publisher: ACM
Date: 10-07-2023
Publisher: IEEE
Date: 02-2019
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: CSIRO
Date: 2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: IEEE
Date: 25-11-2020
Publisher: Elsevier BV
Date: 07-2020
Publisher: Journal of Privacy and Confidentiality
Date: 06-2014
Abstract: Privacy-preserving record linkage (PPRL) addresses the problem of identifying matching records from different databases that correspond to the same real-world entities using quasi-identifying attributes (in the absence of unique entity identifiers), while preserving privacy of these entities. Privacy is being preserved by not revealing any information that could be used to infer the actual values about the records that are not reconciled to the same entity (non-matches), and any confidential or sensitive information (that is not agreed upon by the data custodians) about the records that were reconciled to the same entity (matches) during or after the linkage process. The PPRL process often involves three main challenges, which are scalability to large databases, high linkage quality in the presence of data quality errors, and sufficient privacy guarantees. While many solutions have been developed for the PPRL problem over the past two decades, an evaluation and comparison framework of PPRL solutions with standard numerical measures defined for all three properties (scalability, linkage quality, and privacy) of PPRL has so far not been presented in the literature. We propose a general framework with normalized measures to practically evaluate and compare PPRL solutions in the face of linkage attack methods that are based on an external global dataset. We conducted experiments of several existing PPRL solutions on real-world databases using our proposed evaluation framework, and the results show that our framework provides an extensive and comparative evaluation of PPRL solutions in terms of the three properties.
Publisher: Swansea University
Date: 20-02-2018
Abstract: Data linkage, the process of identifying records that refer to the same entities across databases, is a crucial component of Population Data Science. Data linkage has a history going back over fifty years with many different methods and techniques being developed in various disciplines including computer science, statistics, and health informatics. Data linkage researchers and practitioners are commonly only familiar with methods and techniques that have been developed or are used in their own discipline, and they often only follow research that is being published at venues in their own discipline. There is currently no single online resource that allows data linkage researchers and practitioners across different disciplines to exchange ideas, post questions, or advertise new publications, software, open positions, or upcoming conferences and workshops. This leads to a communication gap in the multi-disciplinary field of data linkage. We aim to address this gap with the DLforum, a public online discussion forum for data linkage. DLforum contains several discussion areas, including publication announcements, resources (software and data sets), information about upcoming conferences and workshops, job opportunities, and general questions related to data linkage. The forum includes a moderation process where all registered users can post content and reply to posts by other users. We anticipate that the number of users registered and the amount of content posted in the forum will show that such an online forum is of value to data linkage researchers and practitioners from different disciplines to effectively communicate and exchange their knowledge, and thus form an online community of practice. In this paper we describe the methods of developing the DLforum, its structure and content, and our plan on how to evaluate the forum. The DLforum is freely available at: dmm.anu.edu.au/DLforum/
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: Springer International Publishing
Date: 2020
Publisher: IEEE
Date: 15-12-2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: ACM Press
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Springer International Publishing
Date: 2018
Publisher: IEEE
Date: 12-2021
Location: Australia
Start Date: 2015
End Date: 2014
Funder: German Academic Exchange Service London
View Funded ActivityStart Date: 2011
End Date: 2014
Funder: Department of Education, Employment and Workplace Relations, Australian Government
View Funded ActivityStart Date: 2016
End Date: 2016
Funder: Google
View Funded Activity