Applied Web data extraction and integration

Number and Type:

181189 VU SS 2007

Lecturer: Robert Baumgartner (and maybe some guest lecturers for selected units)
Selected Keywords: Tools and methods for web data extraction and integration, Web Process Automation, Web Data for BI, Real-time Web monitoring
Preliminary Meeting: Thursday 15th of March, 9:00 (s.t.), Seminar room 1842, Favoritenstrasse 11
Registration: until 8th of March via TUWIS (limited participant number). Please de-register in TUWIS in case you decide not to take the course.
Language: Slides in English, lecture language depending whether non-german speaking students join
Timetable: Thursday 9:00-11:00 (see below for details)
Procedure: Lecture coupled with exercises and group work
Content:
  • Web Data Extraction Frameworks and Scenarios: Commercial, Academic and OpenSource
  • Data Integration Frameworks
  • Web Data for Web 2.0 Mashups and Situational Applications
  • Creation of more complex sample scenarios in some of the extraction/integration frameworks
  • Enterprise Application Integration vs. B2B
  • Wrapper Learning
  • Integration Broker and EDI
  • Web Process Automation
  • Web Data for Orchestration and Choreography Processes
  • Web ETL Connectors: Web Data for Business Intelligence and Data Warehouses
  • Sample Scenarios in vertical domains (energy, automotive, tourism)
  • Information Extraction and Form Mapping for real-time Meta-Searches
  • Leveraging Web Data for the Semantic Web and Semantic Annotation
  • Web Data Cleaning and Record Linkage
Fields of Study: This VU is a compulsory course or compulsory elective in some bachelor and master studies, and is furthermore part of the re-designed KfK Semantic Web, and is part of the European Master Programs Computational Logic.

27-03-2007: Due to illness the session on the 29th of March has to be pushed back to the 19th of April (and the following sessions have been moved accordingly, i.e. the final session is now on 21st of June). Please see the new dates below. I am sorry for any inconvenience.

03-04-2007: I will soon send the email addresses of the other group members to each group.
Please note that I will put the first exercise sheet on the Web page at about the 4th of April in case some groups want to start already during Easter time.

26-04-2007: Due to space constraints a possible alternative exercise evaluation slot from 10:55 to 11:40 is offered. Let me know if your group wants to switch.

13-05-2007: Since every group wants to stay at Thu 9-9:45 we will only switch to a bigger room for the final group presentations (FH8) and for the exercises I continue trying to provide enough chairs :-)

13-05-2007: I will soon put the exercise results for the first three exercise sets on the Web page that you can also look at the results of other groups.

13-06-2007: The timeplan for the talks in the final session is available on the last slide of the 7th lecture. Each group will also receive feedback about their project paper from Max and me.

19-06-2007: Timeplan for final session updated.

Structure of the lecture and slides
Nr.
Session
Topics / Slides

Date

Lecture Location

Exercises

Lecture
Prelim.
Preliminary Meeting

Course Overview

15.3.
Seminar room 1842
9:00 - 9:30
1st
Session 1

Wrap the Web!
Slides: 2 | 6
| Exercise Sheet #1

29.3. 19.4.
Seminar room 1842
9:00 - 10:45
2nd
Session 2 + Exercise Evaluation
End User Mashups / MetaSearch
Slides: 2 | 6
| Exercise Sheet #2
26.4.
Seminar room 1842
9:00-9:45
9:50-10:50
3nd
Session 3 + Exercise Evaluation
Wrapper Learning
Slides | Exercise Sheet #3
3.5.
Seminar room 1842
9:00-9:45
9:50-10:50

4th

Session 4 + Exercise Evaluation

Enterprise Data Integration
Slides: 2 | 6
| Exercise Sheet #4

10.5.
Seminar room 1842
9:00-9:45
9:50-10:50
5th
Session 5 + Exercise Evaluation

SOA and Web Process Integration
Slides: 2 | 6
| Exercise Sheet #5

24.5.
Seminar room 1842
9:00-9:45
9:50-10:50
6th
Session 6 + Exercise Evaluation
Web Data Cleaning
Slides: 2 | 6
| Exercise Sheet #6
31.5.
Seminar room 1842
9:00-9:45
9:50-10:50
7th
Session 7 + Exercise Evaluation
Web ETL and Business Intelligence Scenarios
Slides: 2 | 6
14.6.
Seminar room 1842
9:00-9:45
9:50-10:50
Final
Group Presentations

Group Project Topics | Timeplan
Projects (by group number): 1|2|3|4/5|6|7|8|9|10|11

21.6.
FH8 Nöbauer HS
9:00-12:00
 

 

Staff
Robert Baumgartner, last modified on 1/7/2007