Fast and Scalable Search Techniques for Genomic Databases. Tens of thousands of users each day search the genomic databases that are so far the most significant product of the Human Genome Project. In this project, we will investigate fundamental new bioinformatics techniques for information retrieval from genomic databases. The outcomes will allow molecular biologists to accurately and efficiently discover relationships between DNA and protein sequences. In contrast to existing approaches, o ....Fast and Scalable Search Techniques for Genomic Databases. Tens of thousands of users each day search the genomic databases that are so far the most significant product of the Human Genome Project. In this project, we will investigate fundamental new bioinformatics techniques for information retrieval from genomic databases. The outcomes will allow molecular biologists to accurately and efficiently discover relationships between DNA and protein sequences. In contrast to existing approaches, our techniques will remain fast despite the enormous growth in genomic database sizes. This research will contribute significantly to the "key areas of study includ[ing] genomics and bioinformatics" in the new ARC genome/phenome link priority area.Read moreRead less
Efficient and Effective Text Information Retrieval with Phrases. Current Internet search engines find documents by matching queries to documents, then present the closest matches to the user. Such searching is often ineffective. Another technique for searching, which with current algorithms is not feasible for large text collections such as the Web, is to browse vocabularies and view phrases in the contexts in which they are used. The aim of this project is to make wider use of phrases in re ....Efficient and Effective Text Information Retrieval with Phrases. Current Internet search engines find documents by matching queries to documents, then present the closest matches to the user. Such searching is often ineffective. Another technique for searching, which with current algorithms is not feasible for large text collections such as the Web, is to browse vocabularies and view phrases in the contexts in which they are used. The aim of this project is to make wider use of phrases in retrieval, by developing new phrase-based querying algorithms and investigating how users can use phrase indexes to find documents. The outcome will be new, efficient methods for exploring the Internet.Read moreRead less
Development and Application of Techniques for Detecting Equivalent Documents. The web is a vast collection of data, such as text and images, but contains large numbers of duplicates - the same document or picture may be present many times. Even personal collections of information, such as the documents and digital photos people keep on their home computers, often have many versions of the same item. However, detecting such duplicates is not straightforward, as they may have been edited, or may, ....Development and Application of Techniques for Detecting Equivalent Documents. The web is a vast collection of data, such as text and images, but contains large numbers of duplicates - the same document or picture may be present many times. Even personal collections of information, such as the documents and digital photos people keep on their home computers, often have many versions of the same item. However, detecting such duplicates is not straightforward, as they may have been edited, or may, for example, be shown in different forms; for example, the quality of a photo may be reduced for display on a mobile phone. In this project we plan to detect such duplicates, and use the results to improve search and management of data.Read moreRead less
Dynamic Index Maintenance for Text Search Engines. Text retrieval systems such as internet search engines use high-performance indexes to rapidly locate documents that match user queries. In recent years there have been major improvements in query evaluation and index construction techniques. As the data changes, it is necessary to keep the index up to date, but current methods for maintaining indexes are slow and costly. The aim of this project is to develop methods that provide on-the-fly u ....Dynamic Index Maintenance for Text Search Engines. Text retrieval systems such as internet search engines use high-performance indexes to rapidly locate documents that match user queries. In recent years there have been major improvements in query evaluation and index construction techniques. As the data changes, it is necessary to keep the index up to date, but current methods for maintaining indexes are slow and costly. The aim of this project is to develop methods that provide on-the-fly update at much lower cost, thereby improving the performance of text retrieval systems. This work involves both practical development and innovation in fundamental algorithms.Read moreRead less
Conceptual Knowledge Processing. The aim of this collaboration between Computer Science and Mathematics disciplines is to develop a theoretical, methodological and practical understanding of how to support a range of tasks concerning conceptual knowledge processing. The view of the project is that knowledge processing takes place primarily in the human mind and that human communication can only be effectively supported by appropriate design means and devices. Developing prototype software that d ....Conceptual Knowledge Processing. The aim of this collaboration between Computer Science and Mathematics disciplines is to develop a theoretical, methodological and practical understanding of how to support a range of tasks concerning conceptual knowledge processing. The view of the project is that knowledge processing takes place primarily in the human mind and that human communication can only be effectively supported by appropriate design means and devices. Developing prototype software that demonstrates these devices in practical domains continues to be a key feature of the collaboration that benefits from existing DFG (Deutsche Forschungsgemeinschaft) support and has been on going since 1999.Read moreRead less