Table of contents
On Windows
Go to this page to download Apache Spark on its official page, this will download as a ' .tgz ' file, you need to unzip it. Put its extracted files inside a folder in C-Drive(recommended).
Download 'winutils.exe' as per your Spark version by going through the below link of GitHub. Put this file inside a folder named Hadoop in C-Drive(recommended).
In this step, you need to set the user variable and system variable path for spark and Hadoop.
a) Copy the path of the 'winutils.exe' file(I kept it inside a folder named Hadoop) and add its home directory to the user variable as 'HADOOP_HOME'.
b) Copy the path of the Apache spark and add its home directory to the user variable as 'SPARK_HOME'.
c) Add the path of apache sparks' bin folder in the path of system variables, so that you can access it from anywhere on your machine.
d) Check in your cmd/terminal by executing 'spark-shell' command.
On Ubuntu
Execute the below commands one by one
1. sudo apt update
2. sudo apt upgrade
3. sudo apt install default-jdk # to download jdk for Java
4. wget # download apache spark
5. tar xvf spark-3.0.3-bin-hadoop2.7.tgz # unzip it
6. sudo mv spark-3.0.3-bin-hadoop2.7/ /opt/spark # move extracted files to opt/spark directory
7. sudo nano ~/.profile # open this file and paste below codes at last and ctrl+S to save
export SPARK_HOME=/opt/spark
export PYSPARK_PYTHON=/usr/bin/python3
8. source ~/.profile #Load the file to get the changes for Spark environment
9. spark-shell # to start apache spark
10. # start standalone master server of Spark
# check at https://localhost:8080/
11. spark://ubuntu:7077 # here ubuntu is your hostname and can differ system to system
How to start and stop the master and slave in single command - # to start master and slave # to stop master and slave
How to start and stop master # to start master # to stop master