3.1.5.1 Spark Interpreter in Local Mode

To start spark interpreter in the local mode, follow these steps:
  1. Download spark-3.0.3-bin-hadoop2.7.tgz from the website.
  2. Unzip the spark hadoop cluster’s zip file in the below mentioned locations:
    • <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/mmg-studio/ interpreter-server/spark-interpreter-<version>/extralibs
    • <COMPLIANCE_STUDIO_INSTALLATION_PATH>/mmg-home/mmg-studio/interpreter- server/spark-interpreter-<version>/extralibs
  3. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/mmgstudio/ bin directory.
  4. Open the startup.sh file and add following line before the line containing “counter=1”;nohup "$DIR"/../interpreter-server/spark-interpreter-<version>/bin/ spark-interpreter &>> <path_to_save_the_logs>/<log_file_name>.log &

    Figure 3-26 Snapshot of startup.sh file



  5. Save and close the file.
  6. Open the shutdown.sh file and add following line before the line containing “SL=”.
    I7014=`ps -eaf | grep java | grep RemoteInterpreterServer | grep 7014 |
    awk '{print $2}'`
    if [[ "" != "$I7014" ]];
    then kill -9 $I7014;
    fi

    Figure 3-27 Snapshot of shutdown.sh file



    Note:

    In the above step, the port number for the spark interpreter is assumed to be 7014, the default port that comes with the installer. If a different port is used, then change the configuration accordingly.
  7. Save and close the file.
  8. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/mmgstudio/ bin directory.
  9. Open the startup.sh file, navigate to line 29 and update spark value as 7014.
    For example: . ./"$DIR"/datastudio --port 7008 --markdown 7009 --spark 7014 --python 7012 --jdbc 7011 --shell -1 --pgx 7022 --external
  10. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/mmgstudio/ bin directory.
  11. Open the config.sh file and update the following parameters:
    • MMG_SPARK_ENABLED=true
    • SPARK_HOME=<COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/ mmg-studio/interpreter-server/spark-interpreter-<version>/extralibs/ spark-<version>-bin-hadoop<version>
    • HADOOP_HOME=##HADOOP_HOME##

      Note:

      Retain the placeholder as it is.
    • SPARK_MASTER=local
    • SPARK_DEPLOY_MODE=

      Note:

      Retain the SPARK_DEPLOY_MODE as blank.
    • DATASTUDIO_SPARK_INTERPRETER_PORT=7014
  12. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/bin directory.
  13. Open the config.sh file and update the following parameters:
    • MMG_SPARK_ENABLED=true

      Note:

      By default, it is set to false. You can configure the following parameters only when MMG_SPARK_ENABLED is set to true.
    • SPARK_HOME=<COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/ mmg-studio/interpreter-server/spark-interpreter-<version>/extralibs/ spark-<version>-bin-hadoop<version>
    • HADOOP_HOME= ##HADOOP_HOME##

      Note:

      Retain the placeholder as it is.
    • SPARK_MASTER= local
    • SPARK_DEPLOY_MODE=

      Note:

      Retain the SPARK_DEPLOY_MODE as blank.
    • DATASTUDIO_SPARK_INTERPRETER_PORT=7014
  14. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/mmg-home/mmgstudio/ server/builtin/interpreters/spark.json directory.
  15. Navigate to line 169 and update port value as 7014.
  16. Update default value as local for spark.master and blank for spark.submit.deployMode.
    For example:
    "spark.master": {
                              "envName": "MASTER",
                              "propertyName": "spark.master",
                              "defaultValue": "local",
                              "description": "Spark master uri. ex) spark://
    masterhost:7077",
                              "type": "string"
    },
                              "spark.submit.deployMode": {
                                  "envName": null,
                                  "propertyName":"spark.submit.deployMode",
    "defaultValue": "",
                              "description": "The deploy mode of Spark driver
    program, either 'client' or 'cluster'",
                              "type": "string"
                     },
  17. Navigate to <COMPLIANCE_STUDIO_INSTALLATION_PATH>/bin directory.
  18. Restart Compliance Studio using the following command.
    ./compliance-studio.sh –restart
  19. Verify if the spark-interpreter has started using the following command:
    netstat –nltp | grep 7014