NEXTWRAP - Next Generation Web Wrapper Technologies

This project aims at significant scientific and technological improvements of Web information extraction and annotation technology. Current systems for automated Web information extraction allow an application designer to visually specify extraction patterns on sample HTML documents. Pattern instances are then automatically extracted from production documents and translated into XML. In this project we want to pave the way to a next generation extraction technology by performing basic and experimental research towards the following goals:

All tasks are centred on tree-based wrapping and have as their common denominator the use of ontologies. We will develop a strong competence as a team of researchers in establishing a common ontological framework that we believe will form the basis of next generation extraction technology.

Our group is involved in this project together with TU Graz and Lixto Software GmbH. Project Duration 1.1.2005-31.12.2006.

Funded by FFG Fit-IT.