Apache Spark Installation

Apache Spark Installation

Table of contents

On Windows

  1. Go to this page to download Apache Spark on its official page, this will download as a ' .tgz ' file, you need to unzip it. Put its extracted files inside a folder in C-Drive(recommended).

    https://spark.apache.org/downloads.html
    
  2. Download 'winutils.exe' as per your Spark version by going through the below link of GitHub. Put this file inside a folder named Hadoop in C-Drive(recommended).

    https://github.com/steveloughran/winutils/blob/master/hadoop-3.0.0/bin/winutils.exe
    
  3. In this step, you need to set the user variable and system variable path for spark and Hadoop.

    a) Copy the path of the 'winutils.exe' file(I kept it inside a folder named Hadoop) and add its home directory to the user variable as 'HADOOP_HOME'.

    b) Copy the path of the Apache spark and add its home directory to the user variable as 'SPARK_HOME'.

    c) Add the path of apache sparks' bin folder in the path of system variables, so that you can access it from anywhere on your machine.

    d) Check in your cmd/terminal by executing 'spark-shell' command.

On Ubuntu

Execute the below commands one by one

1. sudo apt update
2. sudo apt upgrade
3. sudo apt install default-jdk        # to download jdk for Java
4. wget https://archive.apache.org/dist/spark/spark-3.0.3/spark-3.0.3-bin-hadoop2.7.tgz                          # download apache spark
5. tar xvf spark-3.0.3-bin-hadoop2.7.tgz   # unzip it
6. sudo mv spark-3.0.3-bin-hadoop2.7/ /opt/spark  # move extracted files to opt/spark directory
7. sudo nano ~/.profile   # open this file and paste below codes at last and ctrl+S to save
   export SPARK_HOME=/opt/spark
   export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
   export PYSPARK_PYTHON=/usr/bin/python3
8. source ~/.profile   #Load the file to get the changes for Spark environment
9. spark-shell  # to start apache spark
10. start-master.sh  # start standalone master server of Spark
   # check at https://localhost:8080/ 
11. start-slave.sh spark://ubuntu:7077   # here ubuntu is your hostname and can differ system to system

How to start and stop the master and slave in single command -

start-all.sh    # to start master and slave

stop-all.sh     # to stop master and slave

How to start and stop master

start-master.sh   #  to start master

stop-master.sh    # to stop master

Did you find this article valuable?

Support Rohan Anand by becoming a sponsor. Any amount is appreciated!