AllRight - Know It All and Know It Right: High Quality Knowledge Mining in the Web

From a user point of view, our world has created an impressively huge, rapidly changing offer of information, goods, and services. However, the average user is overwhelmed and confused by this unmanageable number of offers which leads to wrong decisions, waste of time, and frustration. Therefore, knowledge-based systems are deployed to assist the users and to simplify decisions processes in a complex and fast moving world. However, because of the rapid changes in our environment the efficient acquisition and maintenance of the knowledge bases is a key problem to be solved in order to apply semantic systems which ease the life of users.

The goal of the proposed project is to exploit the information available on the Web such that knowledge acquisition could be highly automated. Typically this information on the Web is stored in semi-structured documents containing tables, lists, and natural language descriptions. Therefore, the task of automatic knowledge acquisition is to transform this semi-structured information into structured descriptions of concepts which can be processed by a reasoning system. Based on a general description of a concept (e.g. the definition of a digital camera) the proposed system will automatically discover instances of this concept (e.g. digital cameras of a particular brand), their specifications, and relations to other instances (e.g. accessories). Since the quality of a knowledge-based system is tightly linked to the quality of the knowledge base, we have to aim at the highest possible recall as well as precision (both at least more than 90%). In order to achieve this goal we will explore a knowledge mining framework on the basis of information extraction, natural language processing, machine learning, and model-based reasoning (i.e. deep domain models). This knowledge mining framework shall be highly adaptable to different domains such that the effort for acquiring a new knowledge base is kept low (less than 3 days of work for a trained person). The central research task is to show that such a knowledge mining framework can be developed with reasonable effort. We will show this by developing new methods and algorithms as well as performing an empirical evaluation.

Our group is involved in this project together with University of Klagenfurt, ConfigWorks, and Lixto Software GmbH. Project Duration 1.1.2005-31.12.2006.

Funded by FFG Fit-IT.