Investigation and Development of Parallel Large Scale Record Linkage Techniques. Record linkage aims at matching records of the same entity (like customer or patient) in large (administrative) databases. The outcomes of the proposed research will improve current techniques in terms of efficiency, accuracy and the need for human intervention. Through experimental studies and stochastic modelling the performance of traditional and new methods for data cleaning, standardisation and linkage will be ....Investigation and Development of Parallel Large Scale Record Linkage Techniques. Record linkage aims at matching records of the same entity (like customer or patient) in large (administrative) databases. The outcomes of the proposed research will improve current techniques in terms of efficiency, accuracy and the need for human intervention. Through experimental studies and stochastic modelling the performance of traditional and new methods for data cleaning, standardisation and linkage will be assessed. The effect of the statistical dependency of attribute values will be studied. New methods using clustering for blocking large datasets, and predictive models including interaction terms will be implemented, analysed and evaluated on high-performance computers and office-based PC clusters.
Read moreRead less
Efficient Processing of Complex Spatial Queries. Similarity search and join are two of the most popular yet complex queiries in spatial databases. They are also two of the major spatial data analysis paradigms. To complement the existing techniques, this project aims to investigate a more complex and important form of these two problems, and to develop novel framework to approach the proposed problems. The successful achievements of the project will not only bring new spatial data analysis techn ....Efficient Processing of Complex Spatial Queries. Similarity search and join are two of the most popular yet complex queiries in spatial databases. They are also two of the major spatial data analysis paradigms. To complement the existing techniques, this project aims to investigate a more complex and important form of these two problems, and to develop novel framework to approach the proposed problems. The successful achievements of the project will not only bring new spatial data analysis techniques but also deliever effective solutions to a number of real-life apllications.Read moreRead less
Fast, practical and effective algorithms for clustering with advice. To maintain a safe and healthy society, government and industry need high quality immunization and national security databases. Since we cannot afford to have duplicate, incomplete and conflicting records that refer to the same person, we unify them by identifying clusters of related records.
In the emerging field of functional genomics, diagnosis of certain diseases is enhanced by determining which genes act together. Diffe ....Fast, practical and effective algorithms for clustering with advice. To maintain a safe and healthy society, government and industry need high quality immunization and national security databases. Since we cannot afford to have duplicate, incomplete and conflicting records that refer to the same person, we unify them by identifying clusters of related records.
In the emerging field of functional genomics, diagnosis of certain diseases is enhanced by determining which genes act together. Different experimental runs might result in different clusterings of genes: we need one consensus clustering that summarizes the experimental outcomes.
Cleaning databases and combining clusterings by hand would require vast amounts of time. This project will result in faster and more accurate computational procedures.Read moreRead less
Computing Order Statistcs over Data Streams. While data stream computation currently is one of the most challenging areas in IT research community, order statistics computation is a very important topic in data stream computation.
This project aims to deliver advanced techniques that promise a great impact on data stream technology. The success of this project will give another competitive edge for Australia to continue her leading role in the development of core IT technology. Moreover,
the ....Computing Order Statistcs over Data Streams. While data stream computation currently is one of the most challenging areas in IT research community, order statistics computation is a very important topic in data stream computation.
This project aims to deliver advanced techniques that promise a great impact on data stream technology. The success of this project will give another competitive edge for Australia to continue her leading role in the development of core IT technology. Moreover,
the research outcome of the project will provide generic solutions to many Australia based
industries, including e-finance, telecommunication, network management, sensor network technology
development, and environment monitoring.
Read moreRead less
Effectively Computing and Maintaining Graph-based Statistics in Large Scale Applications. The expected research outcome includes significantly technical contributions to the graph-based query processing technology development by supporting on-line data analysis. The proposed systematic, algorithm and database centric approach to investigate the novel, ubiquitous problems will lead to a greater support, from the database community, to the advanced real applications, and to creating new opportunit ....Effectively Computing and Maintaining Graph-based Statistics in Large Scale Applications. The expected research outcome includes significantly technical contributions to the graph-based query processing technology development by supporting on-line data analysis. The proposed systematic, algorithm and database centric approach to investigate the novel, ubiquitous problems will lead to a greater support, from the database community, to the advanced real applications, and to creating new opportunities for the IT industry. The success of this project will not only further enhance us as an internationally leading research group in data statistics computation and provide training for high quality personnel in this important and growing area but also bring considerable economic and social benefits to Australia.Read moreRead less
Analyzing Uncertain Data: Probabilistic Approaches. The expected research outcome includes siginificantly technical contributions to the uncertain data analysis technology development by supporting probablistic query procesing. The proposed systematic, algorithm and database centric approach to investigate the novel, ubiquitous problems will lead to a greater support, from the database community, to the advanced real applications, and creating new opportunities for the IT industry. The success o ....Analyzing Uncertain Data: Probabilistic Approaches. The expected research outcome includes siginificantly technical contributions to the uncertain data analysis technology development by supporting probablistic query procesing. The proposed systematic, algorithm and database centric approach to investigate the novel, ubiquitous problems will lead to a greater support, from the database community, to the advanced real applications, and creating new opportunities for the IT industry. The success of this project will not only further enhance us as an internationally leading research group in uncertain data analysis and provide training for high quality personnel in this important and growing area but also bring considerable economic and social benefits to Australia.Read moreRead less