[Date Prev][Date Next][Date Index]

Talk Announcement:Mini-Symposium "Data Extraction and Data Mappings"

Mini-Symposium ?Data Extraction and Darta Mappings?

Date: Monday, October 24th 2005, 14:00 ? 16:00
Location: Favoritenstraße 11, Groundfloor, red area, Zemanek Hörsaal

David W Embley:
Title: "Semantic Understanding: An Approach Based on Information Extraction Ontologies"

Abstract: Information is ubiquitous, and we're flooded with more than we can process. Somehow, we must rely less on visual processing, point-and-click navigation, and manual decision making and more on computer sifting and organization of information and automated negotiation and decision making. A resolution of these problems requires software agents with semantic understanding---a grand challenge of our time. More particularly, we must solve problems of information extraction, semantic annotation, question answering, service request satisfaction, automated interoperability, integration, and knowledge sharing. This talk addresses aspects of these problems and suggests the use of data-extraction ontologies as an approach that may help lead to semantic understanding.

Alan Nash:
Title: Composition of Mappings Given by Embedded Dependencies

Abstract: Composition of mappings between schemas is essential to support schema evolution, data exchange, data integration, and other data management tasks. In many applications, mappings are given by embedded dependencies. In this paper, we study the issues involved in composing such mappings.

Our algorithms and results extend those of Fagin et al. [FKPT04] who studied composition of mappings given by several kinds of constraints. In particular, they proved that full source-to-target tuple-generating dependenc ies (tgds) are closed under composition, but embedded source-to-target tgds are not. They introduced a class of second-order constraints, SO tgds, that is closed under composition and has desirable properties for data exchange.

We study constraints that need not be source-to-target and we concentrate on obtaining (first-order) embedded dependencies. As part of this study, we also consider full dependencies and second-order constraints that arise from Skolemizing embedded dependencies. For each of the three classes of mappings that we study, we provide (a) an algorithm that attempts to compute the composition and (b) sufficient conditions on the input mappings that guarantee that the algorithm will succeed.

In addition, we give several negative results. In particular, we show that full dependencies are not closed under composition, and that second-order dependencies that are not limited to be source-to-target are not closed under restricted composition. Furthermore, we show that determining whether the composition can be given by these kinds of dependencies is undecidable.

Allright Group: Wolfgang Holzinger, Bernhard Krüpl

Title: "Project AllRight: a practical implementation of Web Information Extraction, work in progress"

In this talk we present our work in progress in creating a Web information extraction platform (project AllRight). AllRight tries to automatically locate and extract data about products of a given domain described by a product ontology. In the talk, we will concentrate on two specific sub tasks of the project, namely the Information Retrieval (IR) and Information Extraction (IE) stages:

- In the retrieval stage we operate in two stages: we first gather a starting set of pages by querying an internet search engine with keywords derived from the domain knowledge. Then we try to elaborate from this starting set by using a web crawler that searches the neighborhood of those initially found pages for similar content. We will present the algorithms and heuristics used in this approach.

- In the extraction stage, we follow an unorthodox approach: rather than analysing the HTML source code of a web page, we use a standard web browser to render the page and take advantage of the spatial information of text items as displayed on screen. We will give insights into this process and explain why we believe it to be superior to traditional HTML based approaches.