Basic Spark Tutorial
DBAI Research Seminar, October 12th 2017
Theresa Csar
This is a brief introduction to Spark and some related technologies. Spark is an engine for large-scale data processing and is often handled as the successor of Hadoop Mapreduce. Spark's key-concepts are the resilient distributed datasets and their lazy evaluation. The topics of this tutorial are: Basic Concepts of Spark, RDDs, DataFrames, Spark SQL and GraphX.
We will use the Databricks community platform for the Tutorial, so you will have to create an account at
https://community.cloud.databricks.com/ to follow the practical part.
You can download the slides here:
Slides
Create a user account for the community edition of databricks:
Databricks
Go to your "Workspace" in Databricks and import the following Notebooks (right-click) using the URLs: