Number and Type: |
181130 VU WS 2006/07
|
Lecturer: |
Robert Baumgartner |
Keywords: |
Information Extraction, Approaches and methods for Wrapper Generation, Web Querying, Integration, XML. |
Preliminary Meeting: |
Thursday 5th of October, 9:00 (s.t.), Zemanek Hörsaal, Favoritenstrasse 11 |
Registration: |
until 4th of October via TUWIS (limited participant number). Please de-register in TUWIS in case you decide not to take the course. I try to consider as many participants as possible, but sooner or later I have to limit the number... |
Language: |
Slides in English, lecture language depending whether
non-german speaking students from the computational logic study join |
Timetable: |
about every other Thursday 10:15-12:15 (on the first session,
19th of October, we start on 9:15) |
Procedure: |
Lecture coupled with exercises and group work; two exercise
evaluation slots: one at 9:00 (groups 1-8, 16), one at 12:30 (groups 9-15,17-18) on the lecture dates; lecture at 10:15 (on the first
lecture day at 9:15) |
Content: |
- Information Extraction: Setting, History, IE vs. IR
- Structured Data Extraction and Wrapping
- XML Transformation and Query Languages (in particular XPath and XSLT, very short look on XQuery)
- Web Wrapper Languages
- Wrapper Generation Approaches
- Inductive Wrapper Generation: Machine Learning on Strings/Trees, Tree Edit Distances
- Automatic Data Extraction / Web Data Mining
- Supervised Wrapper Generation
- Deep Web Navigation Approaches
- Data Extraction from PDF documents
- Mediation and Integration Approaches
- Web Data Cleaning
- Lixto Visual Wrapper and Transformation Server
|
Fields of Study: |
This VU is a compulsory course or compulsory elective
in some bachelor and master studies, and is furthermore part of the re-designed KfK Semantic Web, and is part of the European
Master Programs Computational Logic. |
Structure of the lecture and slides
|
Nr. |
Session |
Topics / Slides |
Lecture Time
(all groups)
|
Lecture Location |
G1-G8 and G16 Exercises
(Sem.room 1842)
|
G9-G15, G17-G18 Exercises
(Sem.room 1842)
|
Prelim.
|
Preliminary Meeting
|
|
5.10. 9:00-10:00 |
Zemanek HS |
|
|
1st
|
Session 1 |
|
19.10. 9:15-12:15 |
Vortmann HS |
|
|
2nd
|
Session 2 + Exercise Evaluation |
|
9.11. 10:15-12:15 |
Zemanek HS |
9:00-10:00 |
12:30-13:30 |
3nd
|
Session 3 + Exercise Evaluation |
Lixto Visual Wrapper and Transformation Server
Slides: 2 | 6 | Exercises |
16.11. 10:15-12:15 |
Zemanek HS |
9:00-10:00 |
12:30-13:30 |
4th
|
Session 4 + Exercise Evaluation |
Web Data Cleaning, Mediation and Integration
Slides: 2 | 6 | Exercises see below |
30.11. 10:15-12:15 |
Zemanek HS |
9:00-10:00 |
12:30-13:30 |
5th
|
Session 5 + Exercise Evaluation |
Inductive Wrapper Generation, Web Content Mining, Extr. from PDF
Slides | Exercises see below
|
14.12. 10:00-11:30 (*) |
Zemanek HS |
9:00-10:00, Zemanek room (*) |
09:00-10:00,
Zemanek room(*)
|
6th
|
Session 6 + Exercise Evaluation |
Extraction Workflows, Meta Search Concepts, Extr. on Visual Rendition
Slides: 2 | 6 |
11.1. 10:15-12:15 |
Seminar room 1842 (#) |
9:00-10:00 |
12:30-13:30 |
7th
|
Group Presentations |
|
25.1. |
Seminar room 1842 |
9:00-12:00 |
12:00-15:00 |
Exercise Sheet 4 and 5 and Group Project Topics
Reference Solutions and Remarks to Exercise Sheet 1 and 2
|
Group Projects: Presentation and Paper Downloads |
Group Projects Unit 1: |
G1 (Web Mining), G3 (Stylus Studio), G4 (Protege), G5 (XMLDBMS), G7 (Castor), G8 (OpenKapow), G16 (MapForce) |
Group Projects Unit 2: |
G9 (XMLDBMS), G10 (Gate), G11 (Protege), G13 (Castor), G15 (MetaSearch), G17 (Web Mining), G18 (MapForce) |
(*) Note: On the 14th of December, due to time constraints of the lecturer, both exercise slots have to be considered in one single session, which will be held in the Zemanek room from 9 to 10. Participants of Slot B who can not attend at this time are excused. Note that each group will get e-mail feedback to their brainstorming/content slideset, too. The lecture has to start a bit earlier than usual, at 10:00. Everyone is encouraged to visit the lecture due to very interesting talks of PhD students.
(#) Note: Location changed! Everyone is encouraged to visit the lecture due to very interesting talks of PhD students.
|