A.26 Enable Additional Spark or PySpark interpreter
An additional Spark or PySpark interpreter is required to connect to two different external clusters at the same time.
To set up an additional Spark or PySpark interpreter, follow these steps:
                  
            - Create a start-script for the second Spark interpreter.
                        Note: This is an optional step.- Navigate to the
                                <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/interpreters/ bindirectory and create a new start-script calledstart-spark2-interpreter.shusing the following command:cp start-spark-interpreter.sh start-spark2-interpreter.sh
- Edit the start-spark2-interpreter.sh file in the
                                <COMPLIANCE_STUDIO_INSTALLATION_- PATH>/deployed/interpreters/bin/directory to update:- Port number to a new port number that is not in use (for example, 7030)
- Rename the log file, search for the text, .log and give a new name to the log (for example, from spark.log to spark2.log).
 
- Edit the start-all-interpreters.shfile in the<COMPLIANCE_STUDIO_INSTALLATION_ PATH>/interpreters/bin/directory as follows:- Search for the text sh "$DEPLOY_APP_HOME"/interpreters/bin/startspark- interpreter.sh &
- Add an additional entry with sh "$DEPLOY_APP_HOME"/interpreters/bin/ start-spark2-interpreter.sh &
 Note: For the 2nd Spark interpreter variant, usestart-spark2- interpreter.sh, when configuring for a 3rd variant, use asstartspark3- interpreter.shetc.
- Search for the 
 
- Navigate to the
                                
- Create the interpreter JSON for the additional Spark
                        interpreter.
                        - Navigate to the<COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/interpreters/ confdirectory and create the new interpreter JSON calledspark2.jsonusing the following command:cp spark.json spark2.json
- Edit the spark2.jsonfile in the<COMPLIANCE_STUDIO_INSTALLATION_PATH>/ deployed/interpreters/conf/directory as follows:- Update the following parameter
                                    values:group: <new-spark-interpreter-name>, name: <new-spark-interpreter-name>, groupSettings.initialCodeCapability: <new-spark-interpreter-name>, port: 7030 (the port chosen in the step 1), capabilities.name: <new-spark-interpreter-name>, capabilities.button.label: <new-spark-interpreter-name>,
 
- Update the following parameter
                                    values:
 
- Navigate to the
- After the update, the file will look like the
                    following:[ { "group": "spark", "name": "spark", "className": "org.apache.zeppelin.spark.SparkInterpreter", "groupSettings": { "initialCode": "1+1", "initialCodeCapability": "spark" }, "host": "localhost", "port": 7017, "capabilities": [ { "name": "spark", "highlightLanguage": "scala", "formEscapeCharacter": "@", "button": { "defaultCode": "println(\"Hello, world\")", "icon": "fa fa-fw fa-building-o", "label": "Spark" } } ], "defaultInterpreter": true, "properties": { "spark.executor.memory": { "envName": null, "propertyName": "spark.executor.memory", "defaultValue": "", "description": "Executor memory per worker instance. ex) 512m, 32g", "type": "string" }, "args": { "envName": null, "propertyName": null, "defaultValue": "", "description": "spark commandline args", "type": "textarea" }, "zeppelin.spark.useHiveContext": { "envName": "ZEPPELIN_SPARK_USEHIVECONTEXT", "propertyName": "zeppelin.spark.useHiveContext", "defaultValue": true, "description": "Use HiveContext instead of SQLContext if it is true.", "type": "checkbox" }, "spark.app.name": { "envName": "SPARK_APP_NAME", "propertyName": "spark.app.name", "defaultValue": "Zeppelin", "description": "The name of spark application.", "type": "string" }, "spark.pyspark.python": { "envName": null, "propertyName": "spark.pyspark.python", "defaultValue": "python3", "description": "Python command to run pyspark workers with", "type": "string" }, "zeppelin.spark.printREPLOutput": { "envName": null, "propertyName": "zeppelin.spark.printREPLOutput", "defaultValue": true, "description": "Print REPL output", "type": "checkbox" }, "spark.cores.max": { "envName": null, "propertyName": "spark.cores.max", "defaultValue": "", "description": "Total number of cores to use. Empty value uses all available core.", "type": "number" }, "zeppelin.spark.maxResult": { "envName": "ZEPPELIN_SPARK_MAXRESULT", "propertyName": "zeppelin.spark.maxResult", "defaultValue": "1000", "description": "Max number of Spark SQL result to display.", "type": "number" }, "spark.master": { "envName": "MASTER", "propertyName": "spark.master", "defaultValue": "yarn", "description": "Spark master uri. ex) spark://masterhost:7077", "type": "string" }, "spark.yarn.archive": { "envName": null, "propertyName": "spark.yarn.archive", "defaultValue": "", "description": "An archive containing needed Spark jars for distribution to the YARN cache", "type": "string" }, "spark.driver.bindAddress": { "envName": "DRIVER_BIND_ADDRESS", "propertyName": "spark.driver.bindAddress", "defaultValue": "0.0.0.0", "description": "Hostname or IP address where to bind listening sockets.", "type": "string" }, "zeppelin.spark.enableSupportedVersionCheck": { "envName": null, "propertyName": "zeppelin.spark.enableSupportedVersionCheck", "defaultValue": true, "description": "Do not change - developer only setting, not for production use", "type": "checkbox" }, "zeppelin.spark.uiWebUrl": { "envName": null, "propertyName": "zeppelin.spark.uiWebUrl", "defaultValue": "", "description": "Override Spark UI default URL", "type": "string" }, "zeppelin.spark.useNew": { "envName": null, "propertyName": "zeppelin.spark.useNew", "defaultValue": true, "description": "Whether use new spark interpreter implementation", "type": "checkbox" }, "zeppelin.spark.ui.hidden": { "envName": null, "propertyName": "zeppelin.spark.ui.hidden", "defaultValue": false, "description": "Whether to hide spark ui in zeppelin ui", "type": "checkbox" }, "zeppelin.interpreter.output.limit": { "envName": null, "propertyName": "zeppelin.interpreter.output.limit", "defaultValue": "102400", "description": "Output message from interpreter exceeding the limit will be truncated", "type": "number" } }, "initialCode": [], "editor": { "language": "scala", "editOnDblClick": false } } ]
- Create the interpreter JSON for the second PySpark interpreter.
                        - Navigate to the
                                <COMPLIANCE_STUDIO_INSTALLATION_PATH>/deployed/interpreters/ confdirectory and create the new interpreter JSON calledpyspark2.jsonusing the following command:cp pyspark.json pyspark2.json
- Edit the pyspark2.jsonfile in the<COMPLIANCE_STUDIO_INSTALLATION_PATH>/ deployed/interpreters/conf/directory as follows:- Update the following parameter
                                    values:group: <new-spark-interpreter-name>, name: <new-spark-interpreter-name>, groupSettings.initialCodeCapability: <new-spark-interpreter-name>, port: 7030 (the port chosen in the step 1), capabilities.name: <new-spark-interpreter-name>, capabilities.button.label: <new-spark-interpreter-name>,
 
- Update the following parameter
                                    values:
 
- Navigate to the
                                
- After the update, the file will look like the
                        following:[ { "group": "spark", "name": "pyspark", "className": "org.apache.zeppelin.spark.PySparkInterpreter", "host": "localhost", "port": 7017, "capabilities": [ { "name": "pyspark", "highlightLanguage": "python", "button": { "defaultCode": "print('Hello World')", "icon": "icon-python", "label": "PySpark" }, "formEscapeCharacter": "$" } ], "properties": { "zeppelin.pyspark.python": { "envName": "PYSPARK_PYTHON", "propertyName": null, "defaultValue": "python3", "description": "Python executable to run pyspark with", "type": "string" }, "zeppelin.pyspark.useIPython": { "envName": null, "propertyName": "zeppelin.pyspark.useIPython", "defaultValue": false, "description": "whether use IPython when it is available", "type": "checkbox" }, "zeppelin.interpreter.output.limit": { "envName": null, "propertyName": "zeppelin.interpreter.output.limit", "defaultValue": "102400", "description": "Output message from interpreter exceeding the limit will be truncated", "type": "number" } }, "initialCode": [] } ]Note: If you try to connect two interpreters to different external clusters when setting the environment variables,SPARK_HOMEandHADOOP_CONF_DIR, as part of providing custom Spark libraries in Yarn Mode, ensure that you append the environment variables to the respective Spark interpreter start-scripts.
- Restart Compliance Studio. To do this, navigate to the
                        <COMPLIANCE_STUDIO_INSTALLATION_PATH>/bin/directory and run the./compliancestudio. sh –restart or ./compliance-studio.sh –r script