Managing and Mining Evolving Ontologies through combining Patterns Languages and Data Mining. Data mining offers insight into the substantial and growing repositories of medical data while pattern languages offer the ability to provide a high level conceptual designs of an artifact or process. This project thus promises to facilitate, on the one hand, the discovery of medical knowledge from large quantities of clinical or epidemiological data, while also providing a better way of constructing a ....Managing and Mining Evolving Ontologies through combining Patterns Languages and Data Mining. Data mining offers insight into the substantial and growing repositories of medical data while pattern languages offer the ability to provide a high level conceptual designs of an artifact or process. This project thus promises to facilitate, on the one hand, the discovery of medical knowledge from large quantities of clinical or epidemiological data, while also providing a better way of constructing and validating medically oriented design patterns which can then be used in medical data collection systems.Read moreRead less
A new erasure resilient technique for encoding internet packets. Efficient internet communication tolerates losing some packets sent across the web by sending a bit more information than is required. Any holes in the transmission can be repaired using the redundant data. We propose a new transmission protocol that is much simpler to encode and repairs broken messages faster. This new approach, based on sending data plus summed versions of itself, has generic applicability across all packet switc ....A new erasure resilient technique for encoding internet packets. Efficient internet communication tolerates losing some packets sent across the web by sending a bit more information than is required. Any holes in the transmission can be repaired using the redundant data. We propose a new transmission protocol that is much simpler to encode and repairs broken messages faster. This new approach, based on sending data plus summed versions of itself, has generic applicability across all packet switched information networks.Read moreRead less
Handling unreliable, uncertain and inadequate data for Intelligence led Investigation. Intelligence led investigation has been successful recently in drug and people smuggling, preparation or instigation of acts of terrorism, and can benefit profoundly from the techniques we will develop, in the timely management and inference from many sources and kinds of uncertain information. This work will assist in making Australia a safer and more secure country.
E.g., Australian Bureau of Statistics ....Handling unreliable, uncertain and inadequate data for Intelligence led Investigation. Intelligence led investigation has been successful recently in drug and people smuggling, preparation or instigation of acts of terrorism, and can benefit profoundly from the techniques we will develop, in the timely management and inference from many sources and kinds of uncertain information. This work will assist in making Australia a safer and more secure country.
E.g., Australian Bureau of Statistics figures show that for 2004, investigations of some 35% of murders, 63% of kidnappings, and 80% of robberies are incomplete at 30 days. Terrorism investigations are harder in that usually there is no initial crime trigger for an investigation. Any assistance our tools can provide in will be of significant benefit to Australia.Read moreRead less
Internet web page mining. This project aims to study the behaviour of internet search engines designed using a best first search strategy, and to improve on the existing designs. The outcome of the project will be a much better design of internet search engine which can be used to search for specific topics. The benefit to the Australian partner will be the gaining of skills in internet search engine design, originating from the Italian partner's group. The benefit to the Italian partner will be ....Internet web page mining. This project aims to study the behaviour of internet search engines designed using a best first search strategy, and to improve on the existing designs. The outcome of the project will be a much better design of internet search engine which can be used to search for specific topics. The benefit to the Australian partner will be the gaining of skills in internet search engine design, originating from the Italian partner's group. The benefit to the Italian partner will be the gaining of skills in research techniques, e.g., utilising a support vector machine as a classification tool, data mining techniques developed originally by the Australian partner's group, and in further developments of these techniques with specific applications to the internet web page mining problem.Read moreRead less
Extensions to the page scoring algorithm in internet search engine studies. This project proposes to study two extensions to the Page rank equation which is one of the theoretical underpinning of Google's web page scoring engine. In particular, we wish to explore ways to combine page connectivity and page characteristics in the scoring of web pages. This will be the first time a rational way is proposed for combining these two factors. The expected outcome will be a deeper understanding on how t ....Extensions to the page scoring algorithm in internet search engine studies. This project proposes to study two extensions to the Page rank equation which is one of the theoretical underpinning of Google's web page scoring engine. In particular, we wish to explore ways to combine page connectivity and page characteristics in the scoring of web pages. This will be the first time a rational way is proposed for combining these two factors. The expected outcome will be a deeper understanding on how these two factors affect the scores of a web page in a search engine, and hence how they affect the visibility of the page in response to a query.
Read moreRead less
Data structures which change with time, a machine learning approach. Visibility of web pages, based on page importance, on the Internet controls their accessibility by users which is critical for e-Commerce applications. The page importance depends on its contents and its link structure to other web pages, both of which can be time varying. This project proposes a novel model in which time varying aspects of the changes to contents and their link structures are captured, thus allowing us a bette ....Data structures which change with time, a machine learning approach. Visibility of web pages, based on page importance, on the Internet controls their accessibility by users which is critical for e-Commerce applications. The page importance depends on its contents and its link structure to other web pages, both of which can be time varying. This project proposes a novel model in which time varying aspects of the changes to contents and their link structures are captured, thus allowing us a better understanding of how these influence the page importance over time. It will also allow us insight on how to improve the visibility of web pages.Read moreRead less
Investigations in Learning Algorithms for Web Page Scoring Systems. Modification of web page scores to satisfy requirements, e.g., one page should have a higher page score than another, a home page should have higher score than any other pages in the same site, using modifications of the forcing function, and the link connectivity matrix respectively of the PageRank equation will be studied. By clustering web pages either by ranks or by scores will help overcome issues of scale and complexity wh ....Investigations in Learning Algorithms for Web Page Scoring Systems. Modification of web page scores to satisfy requirements, e.g., one page should have a higher page score than another, a home page should have higher score than any other pages in the same site, using modifications of the forcing function, and the link connectivity matrix respectively of the PageRank equation will be studied. By clustering web pages either by ranks or by scores will help overcome issues of scale and complexity which are required for the live world wide web. Outcomes will provide a rational basis together with practical methods for modifying web page scores by a web site administrator.Read moreRead less
Concept-based retrieval and interpretation for large data sets. Access to on-line information is growing at an exponential rate, fuelled by advances in computing and
communications technologies. Current information retrieval methods are becoming ineffective due to
their reliance on simple term-based methods, resulting in a massive number of matches, of which only
a small proportion are relevant. We address this problem by developing new matching algorithms which
understand the underlying ....Concept-based retrieval and interpretation for large data sets. Access to on-line information is growing at an exponential rate, fuelled by advances in computing and
communications technologies. Current information retrieval methods are becoming ineffective due to
their reliance on simple term-based methods, resulting in a massive number of matches, of which only
a small proportion are relevant. We address this problem by developing new matching algorithms which
understand the underlying meaning of documents in database repositories - by building semantic
structures semi-automatically - and thus provide more relevant information to queries.
This project will be of great benefit to a multitude of end-users in medicine, history, law and many other disciplines.
Read moreRead less
Investigations into Distributed Information Processing of the World Wide Web: Addressing Major Bottlenecks in Search Engine Design. The Internet is a global medium used increasingly for commercial purposes. Nationally provided commercial services and products, as well as general types of information are made available globally via the Internet. Web search engines are the only method by which a common user can find a relevant service or information on the Internet. The sheer size and the dynamics ....Investigations into Distributed Information Processing of the World Wide Web: Addressing Major Bottlenecks in Search Engine Design. The Internet is a global medium used increasingly for commercial purposes. Nationally provided commercial services and products, as well as general types of information are made available globally via the Internet. Web search engines are the only method by which a common user can find a relevant service or information on the Internet. The sheer size and the dynamics of the Internet pose a significant challenge to search engines. This project proposes to address some major bottlenecks in search engine design (viz. the page rank computation). This may help future search engines to maintain a good level of Web penetration and, consequently will help to ensure a suitable coverage of nationally available services and information to the world.
Read moreRead less
Efficient data manipulation in document classification. Document Classification has an enormous relevance in an era where large amounts of textual information is available. Document Classification is based on statistical and machine learning techniques that model documents represented as points in a multidimensional space. The Computer Engineering Laboratory (CEL) has ongoing projects using neural networks and other techniques for document classification. We are developing a development environm ....Efficient data manipulation in document classification. Document Classification has an enormous relevance in an era where large amounts of textual information is available. Document Classification is based on statistical and machine learning techniques that model documents represented as points in a multidimensional space. The Computer Engineering Laboratory (CEL) has ongoing projects using neural networks and other techniques for document classification. We are developing a development environment for large classification tasks, and Prof. Lee¡¯s work will focus in managing large amounts of data for them. Using his experience in data compression, databases and web applications, he will produce a set of tools for handling Gigabytes of textual data in our classification environment.Read moreRead less