Applied Web Data Extraction and Integration

Lecture Overview
Number and Type: 181.189 VU SS 2013
Lecturer: Robert Baumgartner (exercises together with tutor Alexander Fischl)
Links: TISS | TUWEL
Selected Keywords: Overview about tools and methods for web data extraction and integration, Web Process Automation, Web Data for BI, Web Data Cleansing, Web Testing
Preliminary Meeting: Friday 8th of March, 16:00 (s.t.), EI 4 Reithoffer HS
Registration: Until 8th of March via TISS (limited participant number). Please de-register in TISS in case you decide not to take the course. ECML students who can not yet register please write me a message to reserve a place for you.
Language: Slides in English, lecture language depending whether non-german speaking students join
Schedule: Friday 16:00-18:15 (Lecture 16:00, Exercises 17:00)
Timetable: 08/03, 15/03, 12/04, 19/04, 03/05, 17/05, 24/05, 07/06, 21/06 (Backup: 14/06)
Procedure: Lecture coupled with exercises and group work
Topics:
  • Web Data Extraction Frameworks and Scenarios: Commercial, Academic and Open Source
  • Data Integration and Mapping
  • Creation of more complex sample scenarios in some of the extraction/integration frameworks
  • Functional Web 2.0 Application Testing
  • Web Process Automation and SOA
  • Web ETL Connectors: Web Data for Business Intelligence
  • Sample Scenarios in vertical domains
  • Web Data Cleansing and Free Text Extraction
  • PDF Data Extraction
  • Elog Extraction Language
Fields of Study: This VU is a component of the curriculum of several master studies and is part of the European Master Programs Computational Logic.


Structure of the Lecture and Slides
Session Topics / Slides Date Lecture Time Lecture Location Exercises
1 Preliminary Meeting and Overview 8.3. 16:00-17:00 EI 4 Reithoffer HS -
2 Wrap the Web (6 in 1 | 1st Exercises) 15.3. 16:00-18:00 EI 4 Reithoffer HS -
3 Wrapper Languages (6 in 1 | 2nd Exercises | MVT Exercise (DOC)| MVT Exercise (PDF)) 12.4. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
4 Web Data Cleansing (6 in 1 | 3rd Exercises) 19.4. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
5 Functional Web Application Testing (6 in 1 | 4th Exercises | Group Project Topics) 3.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
6 Competitive Intelligence and Data Mining (6 in 1 | 5th Exercises) 17.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
7 Web Process Integration and Web Archiving (6 in 1 | 6th Exercises) 24.5. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
8 Visual Feature Detection (Group Projects Agenda) 7.6. 16:00-17:00 EI 4 Reithoffer HS 17:00-18:00
9 Group Project Presentations (Group Projects Download) 21.6. EI 4 Reithoffer HS 16:00-19:00
Logo of Lixto   Logo of Altova