Basic Spark Tutorial

DBAI Research Seminar, October 12th 2017

This is a brief introduction to Spark and some related technologies. Spark is an engine for large-scale data processing and is often handled as the successor of Hadoop Mapreduce. Spark's key-concepts are the resilient distributed datasets and their lazy evaluation. The topics of this tutorial are: Basic Concepts of Spark, RDDs, DataFrames, Spark SQL and GraphX.
We will use the Databricks community platform for the Tutorial, so you will have to create an account at https://community.cloud.databricks.com/ to follow the practical part.

You can download the slides here: Slides

Create a user account for the community edition of databricks: Databricks
Go to your "Workspace" in Databricks and import the following Notebooks (right-click) using the URLs:

Notebook Part 1: Basic Scala Operations in Spark and write your first MapReduce program!
Notebook Part 2: Spark SQL and Dataframes
Notebook Part 3: GraphX and Pregel

Home / Kontakt / Webmaster / Offenlegung gemäß § 25 Mediengesetz: Inhaber der Website ist das Institut für Logic and Computation an der Technischen Universität Wien, 1040 Wien. Die TU Wien distanziert sich von den Inhalten aller extern gelinkten Seiten und übernimmt diesbezüglich keine Haftung. Disclaimer / Datenschutzerklärung