Development and Application of Techniques for Detecting Equivalent Documents. The web is a vast collection of data, such as text and images, but contains large numbers of duplicates - the same document or picture may be present many times. Even personal collections of information, such as the documents and digital photos people keep on their home computers, often have many versions of the same item. However, detecting such duplicates is not straightforward, as they may have been edited, or may, ....Development and Application of Techniques for Detecting Equivalent Documents. The web is a vast collection of data, such as text and images, but contains large numbers of duplicates - the same document or picture may be present many times. Even personal collections of information, such as the documents and digital photos people keep on their home computers, often have many versions of the same item. However, detecting such duplicates is not straightforward, as they may have been edited, or may, for example, be shown in different forms; for example, the quality of a photo may be reduced for display on a mobile phone. In this project we plan to detect such duplicates, and use the results to improve search and management of data.Read moreRead less
Dynamic Index Maintenance for Text Search Engines. Text retrieval systems such as internet search engines use high-performance indexes to rapidly locate documents that match user queries. In recent years there have been major improvements in query evaluation and index construction techniques. As the data changes, it is necessary to keep the index up to date, but current methods for maintaining indexes are slow and costly. The aim of this project is to develop methods that provide on-the-fly u ....Dynamic Index Maintenance for Text Search Engines. Text retrieval systems such as internet search engines use high-performance indexes to rapidly locate documents that match user queries. In recent years there have been major improvements in query evaluation and index construction techniques. As the data changes, it is necessary to keep the index up to date, but current methods for maintaining indexes are slow and costly. The aim of this project is to develop methods that provide on-the-fly update at much lower cost, thereby improving the performance of text retrieval systems. This work involves both practical development and innovation in fundamental algorithms.Read moreRead less
New approaches to interactive sessional search for complex tasks. This project aims to develop new tools and techniques to improve the accuracy and speed of search and data analytics for complex information tasks. There are currently no publicly available search engines which support users engaged in complex interactive search, or that allow searchers to fully control their own data and privacy. Fundamental research advances, based on understanding real user behaviour and search needs will have ....New approaches to interactive sessional search for complex tasks. This project aims to develop new tools and techniques to improve the accuracy and speed of search and data analytics for complex information tasks. There are currently no publicly available search engines which support users engaged in complex interactive search, or that allow searchers to fully control their own data and privacy. Fundamental research advances, based on understanding real user behaviour and search needs will have an impact on important academic, industrial, and government domains, including virtual assistants, health care (clinical decision support), precision medicine, eDiscovery, crime prevention, and detailed socio-economic evaluations.Read moreRead less
Discovery Early Career Researcher Award - Grant ID: DE140100275
Funder
Australian Research Council
Funding Amount
$392,979.00
Summary
Beyond keyword search for ranked document retrieval. This project will develop novel approaches to efficient and effective ranked text retrieval using a new class of rank-aware algorithms derived from self-indexes. These algorithms can support complex statistical calculations on the fly. Efficient algorithm design for big data is an increasingly important problem as energy costs continue to soar and can now exceed hardware costs for big data consumers such as Google. In this project, two importa ....Beyond keyword search for ranked document retrieval. This project will develop novel approaches to efficient and effective ranked text retrieval using a new class of rank-aware algorithms derived from self-indexes. These algorithms can support complex statistical calculations on the fly. Efficient algorithm design for big data is an increasingly important problem as energy costs continue to soar and can now exceed hardware costs for big data consumers such as Google. In this project, two important problems in web search are explored: real-time indexing and long-form query answering. Using self-index algorithms, this project presents a road map to move beyond simple keyword-based ranked document retrieval, thus allowing us to efficiently meet more demanding information needs of users in the next decade.Read moreRead less
Using Past Queries for Fast and Accurate Web Searching. Searching the entire Internet, or a company web site, has become a vital task for modern organisations. While there has been significant research into improving search engines through using web pages themselves, very little attention has been paid to improving web search by exploiting the vast numbers of queries that users submit to search engines each day. This project will use state of the art compression and algorithmic techniques to imp ....Using Past Queries for Fast and Accurate Web Searching. Searching the entire Internet, or a company web site, has become a vital task for modern organisations. While there has been significant research into improving search engines through using web pages themselves, very little attention has been paid to improving web search by exploiting the vast numbers of queries that users submit to search engines each day. This project will use state of the art compression and algorithmic techniques to improve the speed and accuracy of web search using data gleaned from millions of Internet queries (provided under agreement by Microsoft). Improving search engines will have a direct benefit to many Australian industries, and support the government's priority area of "smart information use".Read moreRead less
On effectively modelling and efficiently discovering communities from large networks. Finding and maintaining close communities from very large scale, dynamically changing networks is interesting and challenging. This project aims to develop new techniques to identify such communities as fast as possible through exploiting the rich semantics and individual relationships within the communities.
Efficient Algorithms for In-memory Sorting, Searching and Indexing on Modern Multi-core Cache-based and Graphics Processor Architectures. This project clearly belongs to one of the national research priority
goals, Smart Information Use. The copy-based techniques and work on sorting and searching will considerably impact the development of in-memory algorithms in cutting-edge computer architectures. Efficient suffix trees and suffix sorting have myriad applications in string-processing and will ....Efficient Algorithms for In-memory Sorting, Searching and Indexing on Modern Multi-core Cache-based and Graphics Processor Architectures. This project clearly belongs to one of the national research priority
goals, Smart Information Use. The copy-based techniques and work on sorting and searching will considerably impact the development of in-memory algorithms in cutting-edge computer architectures. Efficient suffix trees and suffix sorting have myriad applications in string-processing and will be of high interest to bioinformatics companies. The sortdex project will develop novel algorithms that will be used by enterprise search engine companies to develop applications for libraries and organisations dealing with large databases. Algorithms using the graphics processor as a co-processor have important applications in the high-growth field of computer graphics and games. Read moreRead less
Dynamic Load Balancing for Systems under Heavy Traffic Demand and High Task Size Variation. Current computer systems cannot cope with extremely heavy traffic demands. A solution to such a difficult problem is to dynamically balance the load across the system's servers. Several solutions have been proposed and demonstrate advances in certain limited conditions (e.g. uniform distribution). However fundamental research work must be undertaken beyond the current way of dealing with the core issues o ....Dynamic Load Balancing for Systems under Heavy Traffic Demand and High Task Size Variation. Current computer systems cannot cope with extremely heavy traffic demands. A solution to such a difficult problem is to dynamically balance the load across the system's servers. Several solutions have been proposed and demonstrate advances in certain limited conditions (e.g. uniform distribution). However fundamental research work must be undertaken beyond the current way of dealing with the core issues of load balancing. Accounting for realistic conditions is a theoretical and practical challenge. This project aims at developing theoretical and computational models for dynamic task distribution for the studied systems. The benefits include substantial improvement of the system response time.Read moreRead less
Searching Cohesive Subgraphs in Big Attributed Graph Data. The availability of big attributed graph data brings great opportunities for realizing big values of data. Making sense of such big attributed graph data finds many applications, including health, science, engineering, business, environment, etc. A cohesive subgraph, one of key components that captures the latent properties in a graph, is essential to graph analysis. This project aims to invent effective models of cohesive subgraphs and ....Searching Cohesive Subgraphs in Big Attributed Graph Data. The availability of big attributed graph data brings great opportunities for realizing big values of data. Making sense of such big attributed graph data finds many applications, including health, science, engineering, business, environment, etc. A cohesive subgraph, one of key components that captures the latent properties in a graph, is essential to graph analysis. This project aims to invent effective models of cohesive subgraphs and efficient algorithms for searching and monitoring cohesive subgraphs in big and dynamic attributed graphs from both structure and attribute perspectives. The methods, techniques, and prototype systems developed in this project can be deployed to facilitate the smart use of big graph data across the nation. Read moreRead less
Modelling and Searching Cohesive Groups over Heterogeneous Graphs . Heterogeneous information networks (HINs) contain richer structural and semantic information represented as different types of objects and links. Searching cohesive groups from HINs finds many applications and also brings challenges at both conceptual and technical levels. This project aims to investigate the effective modelling of cohesive groups that take both homogeneous and heterogeneous information into account for differen ....Modelling and Searching Cohesive Groups over Heterogeneous Graphs . Heterogeneous information networks (HINs) contain richer structural and semantic information represented as different types of objects and links. Searching cohesive groups from HINs finds many applications and also brings challenges at both conceptual and technical levels. This project aims to investigate the effective modelling of cohesive groups that take both homogeneous and heterogeneous information into account for different applications and devise efficient algorithms for searching and monitoring those cohesive groups based on different models. The methods, techniques, and evaluation systems developed in this project can be deployed to facilitate the smart use of heterogeneous information networks across the nation.Read moreRead less