ORCID Profile
0000-0002-6638-0232
Current Organisation
University of Melbourne
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Library and Information Studies | Information Retrieval and Web Search | Information Storage, Retrieval And Management | Analysis of Algorithms and Complexity | Data Structures | Computational Linguistics | Data Format | Coding and Information Theory | Text Processing | Information Systems Not Elsewhere Classified | Sensory Processes, Perception And Performance | Linguistic Processes (Incl. Speech Production And Comprehension) | Artificial Intelligence and Image Processing | Information and Computing Sciences not elsewhere classified | Data Structures | Data Storage Representations | Computation Theory and Mathematics not elsewhere classified | Data Format not elsewhere classified | Other Information and Computing Sciences | Cognitive Science | Psychology | Decision Support And Group Support Systems | Intelligent Robotics | Speech Recognition | Pattern Recognition
Electronic Information Storage and Retrieval Services | Library and related information services | Computer software and services not elsewhere classified | Information processing services | Application tools and system utilities | Behavioural and cognitive sciences | Computer Software and Services not elsewhere classified | Information Services not elsewhere classified | Information services not elsewhere classified | Application Software Packages (excl. Computer Games) | Information Processing Services (incl. Data Entry and Capture) | Information and Communication Services not elsewhere classified | Application Tools and System Utilities | Hearing, vision, speech and their disorders | Expanding Knowledge in the Information and Computing Sciences | Computer hardware and electronic equipment not elsewhere classified | Instrumentation not elsewhere classified |
Publisher: ACM
Date: 02-11-2009
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-1994
DOI: 10.1109/5.286192
Publisher: Springer International Publishing
Date: 2020
Publisher: Springer Science and Business Media LLC
Date: 18-03-2020
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Association for Computing Machinery (ACM)
Date: 08-2008
Abstract: Inverted index structures are a core element of current text retrieval systems. They can be constructed quickly using offline approaches, in which one or more passes are made over a static set of input data, and, at the completion of the process, an index is available for querying. However, there are search environments in which even a small delay in timeliness cannot be tolerated, and the index must always be queryable and up to date. Here we describe and analyze a geometric partitioning mechanism for online index construction that provides a range of tradeoffs between costs, and can be adapted to different balances of insertion and querying operations. Detailed experimental results are provided that show the extent of these tradeoffs, and that these new methods can yield substantial savings in online indexing costs.
Publisher: ACM
Date: 31-10-2005
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2007
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2006
DOI: 10.1109/TKDE.2006.99
Publisher: ACM Press
Date: 2006
Publisher: Springer Berlin Heidelberg
Date: 2005
DOI: 10.1007/11575832_1
Publisher: Springer Berlin Heidelberg
Date: 2005
DOI: 10.1007/11581062_37
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2000
DOI: 10.1109/26.843129
Publisher: ACM
Date: 26-10-2008
Publisher: Association for Computing Machinery (ACM)
Date: 04-10-2019
DOI: 10.1145/3345001
Abstract: Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. Query variations covering the same information need represent one way in which different sources of information might arise. However, when implemented in the obvious manner, fusion over query variations is not cost-effective, at odds with the usual web-search requirement for strict per-query efficiency guarantees. In this work, we propose a novel solution to query fusion by splitting the computation into two parts: one phase that is carried out offline, to generate pre-computed centroid answers for queries addressing broadly similar information needs, and then a second online phase that uses the corresponding topic centroid to compute a result page for each query. To achieve this, we make use of score-based fusion algorithms whose costs can be amortized via the pre-processing step and that can then be efficiently combined during subsequent per-query re-ranking operations. Experimental results using the ClueWeb12B collection and the UQV100 query variations demonstrate that centroid-based approaches allow improved retrieval effectiveness at little or no loss in query throughput or latency and within reasonable pre-processing requirements. We additionally show that queries that do not match any of the pre-computed clusters can be accurately identified and efficiently processed in our proposed ranking pipeline.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 1997
DOI: 10.1109/18.623170
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Association for Computing Machinery (ACM)
Date: 04-1995
Abstract: Recent advances in compression and indexing techniques have yielded a qualitative change in the feasibility of large-scale full-text retrieval.
Publisher: ACM
Date: 09-02-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2001
DOI: 10.1109/18.904514
Publisher: Informa UK Limited
Date: 03-2008
Publisher: Association for Computing Machinery (ACM)
Date: 12-1998
Abstract: Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context of text indexing, paying particular attention to query evaluation speed and space requirements. We have examined their relative performance using both experimentation and a refined approach to modeling of signature files, and demonstrate that inverted files are distinctly superior to signature files. Not only can inverted files be used to evaluate typical queries in less time than can signature files, but inverted files require less space and provide greater functionality. Our results also show that a synthetic text database can provide a realistic indication of the behavior of an actual text database. The tools used to generate the synthetic database have been made publicly available
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2012
DOI: 10.1109/TKDE.2011.63
Publisher: No publisher found
Date: 2008
Publisher: Wiley
Date: 06-1999
DOI: 10.1002/(SICI)1097-024X(199906)29:7<647::AID-SPE252>3.0.CO;2-5
Publisher: Association for Computing Machinery (ACM)
Date: 08-09-2021
DOI: 10.1145/3467890
Abstract: Inverted indexes continue to be a mainstay of text search engines, allowing efficient querying of large document collections. While there are a number of possible organizations, document-ordered indexes are the most common, since they are amenable to various query types, support index updates, and allow for efficient dynamic pruning operations. One disadvantage with document-ordered indexes is that high-scoring documents can be distributed across the document identifier space, meaning that index traversal algorithms that terminate early might put search effectiveness at risk. The alternative is impact-ordered indexes, which primarily support top- disjunctions but also allow for anytime query processing, where the search can be terminated at any time, with search quality improving as processing latency increases. Anytime query processing can be used to effectively reduce high-percentile tail latency that is essential for operational scenarios in which a service level agreement (SLA) imposes response time requirements. In this work, we show how document-ordered indexes can be organized such that they can be queried in an anytime fashion, enabling strict latency control with effective early termination. Our experiments show that processing document-ordered topical segments selected by a simple score estimator outperforms existing anytime algorithms, and allows query runtimes to be accurately limited to comply with SLA requirements.
Publisher: Elsevier BV
Date: 09-2005
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Springer Nature Switzerland
Date: 2023
Publisher: Springer International Publishing
Date: 2021
Publisher: Elsevier BV
Date: 07-2022
Publisher: Springer Science and Business Media LLC
Date: 30-06-2009
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 1997
DOI: 10.1109/26.634683
Publisher: Springer Science and Business Media LLC
Date: 2005
Publisher: Springer Science and Business Media LLC
Date: 05-10-2006
Publisher: Springer Science and Business Media LLC
Date: 21-06-2016
Publisher: Springer Science and Business Media LLC
Date: 06-06-2019
Publisher: Wiley
Date: 17-01-2018
DOI: 10.1002/SPE.2556
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Association for Computing Machinery (ACM)
Date: 29-09-2018
DOI: 10.1145/3239572
Abstract: One typical way of building test collections for offline measurement of information retrieval systems is to pool the ranked outputs of different systems down to some chosen depth d and then form relevance judgments for those documents only. Non-pooled documents—ones that did not appear in the top- d sets of any of the contributing systems—are then deemed to be non-relevant for the purposes of evaluating the relative behavior of the systems. In this article, we use RBP-derived residuals to re-examine the reliability of that process. By fitting the RBP parameter ϕ to maximize similarity between AP- and NDCG-induced system rankings, on the one hand, and RBP-induced rankings, on the other, an estimate can be made as to the potential score uncertainty associated with those two recall-based metrics. We then consider the effect that residual size—as an indicator of possible measurement uncertainty in utility-based metrics—has in connection with recall-based metrics by computing the effect of increasing pool sizes and examining the trends that arise in terms of both metric score and system separability using standard statistical tests. The experimental results show that the confidence levels expressed via the p -values generated by statistical tests are only weakly connected to the size of the residual and to the degree of measurement uncertainty caused by the presence of unjudged documents. Statistical confidence estimates are, however, largely consistent as pooling depths are altered. We therefore recommend that all such experimental results should report, in addition to the outcomes of statistical significance tests, the residual measurements generated by a suitably matched weighted-precision metric, to give a clear indication of measurement uncertainty that arises due to the presence of unjudged documents in test collections with finite pooled judgments.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-1998
DOI: 10.1109/18.681345
Publisher: ACM
Date: 20-07-2008
Publisher: ACM
Date: 19-07-2009
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2018
Publisher: ACM
Date: 28-10-2011
Publisher: Elsevier BV
Date: 09-2006
Publisher: ACM
Date: 20-07-2008
Publisher: ACM Press
Date: 2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-1990
DOI: 10.1109/18.52489
Publisher: Elsevier BV
Date: 08-2007
DOI: 10.1111/J.1753-6405.2007.00085.X
Abstract: In Australia, brand substitution by pharmacists has been possible since 1994. There is no limit to the number of substitutions per prescription. Doctors have expressed concern that patients may receive a different product each time their prescription repeats are dispensed, which has the potential to confuse patients. It is unknown how often multiple substitutions per prescription occur. We aimed to identify the number of switches per prescription for a range of medicines and to determine the number of different brand and generic products supplied on each prescription. Repatriation Pharmaceutical Benefits Scheme prescription claims between 1 January 2001 and 28 February 2006 were identified for atenolol, citalopram, enalapril, metformin, omeprazole, ramipril, and simvastatin. Original prescriptions with five repeats and all supplies dispensed were included. Switches were identified if a different product was supplied on consecutive repeat dispensings. 533,279 original prescriptions were included. 488,735 (92%) had no switches on repeats and 37,513 (7%) had only one switch. Only 1% of all prescriptions had more than one switch identified on repeats, and in most cases only two different products were supplied. None of the prescriptions had a different product supplied on each dispensing. Multiple switches per prescription are uncommon and multiple different products are rarely supplied on repeats of the same prescription. The rules of the brand substitution policy appear to be adequate in allowing brand choice for patients, without leading to multiple switches per prescription.
Publisher: Elsevier BV
Date: 11-1994
Publisher: ACM
Date: 15-08-2005
Publisher: IEEE
Date: 2010
DOI: 10.1109/DCC.2010.59
Publisher: Association for Computing Machinery (ACM)
Date: 30-08-2019
DOI: 10.1145/3342555
Abstract: Huffman’s algorithm for computing minimum-redundancy prefix-free codes has almost legendary status in the computing disciplines. Its elegant blend of simplicity and applicability has made it a favorite ex le in algorithms courses, and as a result it is perhaps one of the most commonly implemented algorithmic techniques. This article presents a tutorial on Huffman coding and surveys some of the developments that have flowed as a consequence of Huffman’s original discovery, including details of code calculation and of encoding and decoding operations. We also survey related mechanisms, covering both arithmetic coding and the recently developed asymmetric numeral systems approach and briefly discuss other Huffman-coding variants, including length-limited codes.
Publisher: ACM
Date: 15-12-2022
Publisher: Elsevier BV
Date: 05-1993
Publisher: ACM
Date: 05-12-2012
Publisher: ACM
Date: 15-08-2005
Publisher: Elsevier BV
Date: 07-2021
Publisher: Springer International Publishing
Date: 2013
Publisher: Association for Computing Machinery (ACM)
Date: 05-06-2017
DOI: 10.1145/3052768
Abstract: Information retrieval systems aim to help users satisfy information needs. We argue that the goal of the person using the system, and the pattern of behavior that they exhibit as they proceed to attain that goal, should be incorporated into the methods and techniques used to evaluate the effectiveness of IR systems, so that the resulting effectiveness scores have a useful interpretation that corresponds to the users’ search experience. In particular, we investigate the role of search task complexity, and show that it has a direct bearing on the number of relevant answer documents sought by users in response to an information need, suggesting that useful effectiveness metrics must be goal sensitive . We further suggest that user behavior while scanning results listings is affected by the rate at which their goal is being realized, and hence that appropriate effectiveness metrics must be adaptive to the presence (or not) of relevant documents in the ranking. In response to these two observations, we present a new effectiveness metric, INST, that has both of the desired properties: INST employs a parameter T , a direct measure of the user’s search goal that adjusts the top-weightedness of the evaluation score moreover, as progress towards the target T is made, the modeled user behavior is adapted, to reflect the remaining expectations. INST is experimentally compared to previous effectiveness metrics, including Average Precision (AP), Normalized Discounted Cumulative Gain (NDCG), and Rank-Biased Precision (RBP), demonstrating our claims as to INST’s usefulness. Like RBP, INST is a weighted-precision metric, meaning that each score can be accompanied by a residual that quantifies the extent of the score uncertainty caused by unjudged documents. As part of our experimentation, we use crowd-sourced data and score residuals to demonstrate that a wide range of queries arise for even quite specific information needs, and that these variant queries introduce significant levels of residual uncertainty into typical experimental evaluations. These causes of variability have wide-reaching implications for experiment design, and for the construction of test collections.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 1997
DOI: 10.1109/69.591454
Publisher: Association for Computing Machinery (ACM)
Date: 12-2010
Abstract: Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories are being searched. A conjunctive query q is equivalent to a | q |-way intersection over ordered sets of integers, where each set represents the documents containing one of the terms, and each integer in each set is an ordinal document identifier. As is the case with many computing applications, there is tension between the way in which the data is represented, and the ways in which it is to be manipulated. In particular, the sets representing index data for typical document collections are highly compressible, but are processed using random access techniques, meaning that methods for carrying out set intersections must be alert to issues to do with access patterns and data representation. Our purpose in this article is to explore these trade-offs, by investigating intersection techniques that make use of both uncompressed “integer” representations, as well as compressed arrangements. We also propose a simple hybrid method that provides both compact storage, and also faster intersection computations for conjunctive querying than is possible even with uncompressed representations.
Publisher: Association for Computing Machinery (ACM)
Date: 21-07-2020
DOI: 10.1145/3397175
Abstract: An entropy coder takes as input a sequence of symbol identifiers over some specified alphabet and represents that sequence as a bitstring using as few bits as possible, typically assuming that the elements of the sequence are independent of each other. Previous entropy coding methods include the well-known Huffman and arithmetic approaches. Here we examine the newer asymmetric numeral systems (ANS) technique for entropy coding and develop mechanisms that allow it to be efficiently used when the size of the source alphabet is large—thousands or millions of symbols. In particular, we examine different ways in which probability distributions over large alphabets can be approximated and in doing so infer techniques that allow the ANS mechanism to be extended to support large-alphabet entropy coding. As well as providing a full description of ANS, we also present detailed experiments using several different types of input, including data streams arising as typical output from the modeling stages of text compression software, and compare theproposed ANS variants with Huffman and arithmetic coding baselines, measuring both compression effectiveness and also encoding and decoding throughput. We demonstrate that in applications in which semi-static compression is appropriate, ANS-based coders can provide an excellent balance between compression effectiveness and speed, even when the alphabet is large.
Publisher: Association for Computing Machinery (ACM)
Date: 11-2010
Abstract: Ranked lists are encountered in research and daily life and it is often of interest to compare these lists even when they are incomplete or have only some members in common. An ex le is document rankings returned for the same query by different search engines. A measure of the similarity between incomplete rankings should handle nonconjointness, weight high ranks more heavily than low, and be monotonic with increasing depth of evaluation but no measure satisfying all these criteria currently exists. In this article, we propose a new measure having these qualities, namely rank-biased overlap (RBO). The RBO measure is based on a simple probabilistic user model. It provides monotonicity by calculating, at a given depth of evaluation, a base score that is non-decreasing with additional evaluation, and a maximum score that is nonincreasing. An extrapolated score can be calculated between these bounds if a point estimate is required. RBO has a parameter which determines the strength of the weighting to top ranks. We extend RBO to handle tied ranks and rankings of different lengths. Finally, we give ex les of the use of the measure in comparing the results produced by public search engines and in assessing retrieval systems in the laboratory.
Publisher: Wiley
Date: 03-11-2006
DOI: 10.1002/ASI.20515
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Association for Computing Machinery (ACM)
Date: 18-08-2023
DOI: 10.1145/3596511
Abstract: Measurement of the effectiveness of search engines is often based on use of relevance judgments. It is well known that judgments can be inconsistent between judges, leading to discrepancies that potentially affect not only scores but also system relativities and confidence in the experimental outcomes. We take the perspective that the relevance judgments are an amalgam of perfect relevance assessments plus errors making use of a model of systematic errors in binary relevance judgments that can be tuned to reflect the kind of judge that is being used, we explore the behavior of measures of effectiveness as error is introduced. Using a novel methodology in which we examine the distribution of “true” effectiveness measurements that could be underlying measurements based on sets of judgments that include error, we find that even moderate amounts of error can lead to conclusions such as orderings of systems that statistical tests report as significant but are nonetheless incorrect. Further, in these results the widely used recall-based measures AP and NDCG are notably more fragile in the presence of judgment error than is the utility-based measure RBP, but all the measures failed under even moderate error rates. We conclude that knowledge of likely error rates in judgments is critical to interpretation of experimental outcomes.
Publisher: ACM
Date: 20-07-2008
Publisher: Springer International Publishing
Date: 2020
Publisher: Elsevier BV
Date: 11-1994
Publisher: ACM
Date: 31-10-2005
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2000
DOI: 10.1109/5.892708
Publisher: ACM
Date: 19-07-2009
Publisher: Wiley
Date: 06-2022
DOI: 10.1002/AAAI.12051
Abstract: Offline evaluation is an essential complement to online experiments in the selection, improvement, tuning, and deployment of recommender systems. Offline methodologies for recommender system evaluation evolved from experimental practice in Machine Learning (ML) and Information Retrieval (IR). However, evaluating recommendations involves particularities that pose challenges to the assumptions upon which the ML and IR methodologies were developed. We recap and reflect on the development and current status of recommender system evaluation, providing an updated perspective. With a focus on offline evaluation, we review the adaptation of IR principles, procedures and metrics, and the implications of those techniques when applied to recommender systems. At the same time, we identify the singularities of recommendation that require different responses, or involve specific new needs. In addition, we provide an overview of important choices in the configuration of experiments that require particular care and understanding discuss broader perspectives of evaluation such as recommendation value beyond accuracy and survey open challenges such as experimental biases, and the cyclic dimension of recommendation.
Publisher: ACM
Date: 23-07-2007
Publisher: Elsevier BV
Date: 05-2023
Publisher: Association for Computing Machinery (ACM)
Date: 12-2008
Abstract: A range of methods for measuring the effectiveness of information retrieval systems has been proposed. These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings. For ex le, recall is not well founded as a measure of satisfaction, since the user of an actual system cannot judge recall. Average precision is derived from recall, and suffers from the same problem. In addition, average precision lacks key stability properties that are needed for robust experiments. In this article, we introduce a new effectiveness metric, rank-biased precision , that avoids these problems. Rank-biased pre-cision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.
Publisher: Springer International Publishing
Date: 2019
Publisher: Springer International Publishing
Date: 2022
Start Date: 03-2004
End Date: 03-2007
Amount: $330,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2014
End Date: 12-2017
Amount: $310,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2003
End Date: 12-2007
Amount: $188,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2011
End Date: 12-2014
Amount: $330,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 02-2011
End Date: 12-2014
Amount: $360,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2019
End Date: 12-2024
Amount: $380,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 06-2016
End Date: 06-2019
Amount: $394,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 04-2008
End Date: 04-2011
Amount: $242,918.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2008
End Date: 12-2011
Amount: $556,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 12-2003
End Date: 12-2004
Amount: $20,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2003
End Date: 12-2007
Amount: $5,625,165.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2020
End Date: 12-2024
Amount: $360,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 12-2004
End Date: 12-2010
Amount: $2,000,000.00
Funder: Australian Research Council
View Funded Activity