Submit a Spark Compute Job
post
/bdcsce/api/v1.1/clustermgmt/{identityDomainId}/instances/{clusterId}/jobs/spark
Request
Path Parameters
-
clusterId: string
Identifier for the Cluster
-
identityDomainId: string
Identity domain ID of the Oracle Cloud Service instance, used for authentication.
Definition of the Spark Job that you want to submit.
Nested Schema : SparkJob
Type:
Show Source
object
-
applicationArchives(optional):
array applicationArchives
Archives to be uncompressed in the executor working directory (YARN mode only)
-
applicationArguments(optional):
array applicationArguments
Arguments that need to be sent to the _Application Main_ or the _Application Script_.
-
applicationClass(optional):
string
In case of an application that runs inside the JVM this will specify the qualified name of the Class that will be used to call Spark-Submit.
-
applicationFile:
string
The name of the file, script, or jar file that will be used to submit the Job.
-
applicationJarFiles(optional):
array applicationJarFiles
List of jars that will be downloaded and collocated for the given Job.
-
applicationName:
string
The name given to the application, mainly used for UI or Audit Logs.
-
applicationPyFiles(optional):
array applicationPyFiles
List of python files that will be downloaded and collocated for the given Job.
-
applicationSupportFiles(optional):
array applicationSupportFiles
List of files that will be downloaded and collocated for the given Job.
-
applicationType(optional):
string
Allowed Values:
[ "hive", "mapreduce", "spark", "tez", "yarn", "unkown" ]
-
clusterId(optional):
string
The Cluster where the given Job is assigned to run.
-
driverClasspath(optional):
array driverClasspath
Driver classpath.
-
driverCores(optional):
integer(int32)
Number of cores used by driver.
-
driverLibraryPath(optional):
string
Extra library path entries to pass to the driver.
-
driverMaxResults(optional):
string
Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.Example:
2G
-
driverMemory(optional):
string
Memory for driver (e.g. 1000M, 2G).Example:
2G
-
excludePackages(optional):
array excludePackages
Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies provided in `packages`.
-
executorCores(optional):
integer(int32)
Number of cores used by executor.
-
executorEnv(optional):
object executorEnv
Additional Properties Allowed: additionalPropertiesMap of environment variable that should be available to the Executor process. For example, the maximum waiting period that the requester is willing to wait for the job to actually start running inside a cluster before the Job is considered failed, REQUEST_TIMEOUT.
-
executorExtraClasspath(optional):
array executorExtraClasspath
Extra classpath entries to prepend to the classpath of executors.This exists primarily for backwards-compatibility with older versions of Spark.Users typically should not need to set this option.
-
executorExtraLibraryPath(optional):
string
Set a special library path to use when launching executor JVM's.
-
executorMemory(optional):
string
Memory for executor (e.g. 1000M, 2G).Example:
2G
-
extraJavaOptions(optional):
array extraJavaOptions
A map of extra JVM options to pass to executors and driver. For instance, GC settings or other logging. Note that it is illegal to set Spark properties or heap size settings with this option. Spark properties should be set using a SparkConf object or the spark-defaults.conf file used with the spark-submit script. Heap size settings can be set with spark.executor.memory.
-
extraListeners(optional):
array extraListeners
Extra listeners
-
id(optional):
string
Unique Identifier of this Job.
-
maxJobDurationInSecs(optional):
object maxJobDurationInSecs
Maximum duration of a Job in seconds. When a Job exceeds the amount of time the Job will be terminated.
-
maxSubmissionLatencyInSecs(optional):
object maxSubmissionLatencyInSecs
The maximum waiting period that the requester is willing to wait for the job to actually start running inside a cluster before the Job is considered failed, REQUEST_TIMEOUT.
-
numExecutors(optional):
integer(int32)
Number of executors.
-
packages(optional):
array packages
Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repository The format for the coordinates should be `groupId:artifactId:version`.
-
queue(optional):
string
Server queue used to submit too.
-
repositories(optional):
array repositories
Comma-separated list of additional remote repositories to search for the maven coordinates given with `packages`.
-
sparkConf(optional):
object sparkConf
Additional Properties Allowed: additionalPropertiesMap of Spark configuration options that were used to submit the Job.
-
sparkJobType:
string
Allowed Values:
[ "batch", "interactive", "streaming" ]
Type of the Spark Job -
sparkSessionType:
string
Allowed Values:
[ "spark", "pyspark", "sparkr" ]
Type of the Spark session used to run the Job.
Nested Schema : applicationArchives
Type:
array
Archives to be uncompressed in the executor working directory (YARN mode only)
Show Source
Nested Schema : applicationArguments
Type:
array
Arguments that need to be sent to the _Application Main_ or the _Application Script_.
Show Source
Nested Schema : applicationJarFiles
Type:
array
List of jars that will be downloaded and collocated for the given Job.
Show Source
Nested Schema : applicationPyFiles
Type:
array
List of python files that will be downloaded and collocated for the given Job.
Show Source
Nested Schema : applicationSupportFiles
Type:
array
List of files that will be downloaded and collocated for the given Job.
Show Source
Nested Schema : excludePackages
Type:
array
Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies provided in `packages`.
Show Source
Nested Schema : executorEnv
Type:
object
Additional Properties Allowed
Show Source
Map of environment variable that should be available to the Executor process. For example, the maximum waiting period that the requester is willing to wait for the job to actually start running inside a cluster before the Job is considered failed, REQUEST_TIMEOUT.
Nested Schema : executorExtraClasspath
Type:
array
Extra classpath entries to prepend to the classpath of executors.This exists primarily for backwards-compatibility with older versions of Spark.Users typically should not need to set this option.
Show Source
Nested Schema : extraJavaOptions
Type:
array
A map of extra JVM options to pass to executors and driver. For instance, GC settings or other logging. Note that it is illegal to set Spark properties or heap size settings with this option. Spark properties should be set using a SparkConf object or the spark-defaults.conf file used with the spark-submit script. Heap size settings can be set with spark.executor.memory.
Show Source
Nested Schema : maxJobDurationInSecs
Type:
object
Maximum duration of a Job in seconds. When a Job exceeds the amount of time the Job will be terminated.
Nested Schema : maxSubmissionLatencyInSecs
Type:
object
The maximum waiting period that the requester is willing to wait for the job to actually start running inside a cluster before the Job is considered failed, REQUEST_TIMEOUT.
Nested Schema : packages
Type:
array
Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repository
The format for the coordinates should be `groupId:artifactId:version`.
Show Source
Nested Schema : repositories
Type:
array
Comma-separated list of additional remote repositories to search for the maven coordinates given with `packages`.
Show Source
Nested Schema : sparkConf
Type:
object
Additional Properties Allowed
Show Source
Map of Spark configuration options that were used to submit the Job.
Security
-
basicAuth: basic
Type:
basic
Response
200 Response
Root Schema : AsyncJobStatus
Type:
Show Source
object
-
aggregatedLogs(optional):
object JobAggregatedLogs
-
aggregatedStoredLogs(optional):
object JobAggregatedStoredLogs
-
allocatedMB(optional):
integer(int32)
Sum of memory in MB allocated to the job's running containers.
-
allocatedOCores(optional):
object allocatedOCores
Read Only:
true
Sum of virtual cores allocated to the job???s running containers. -
allocatedVCores(optional):
integer(int32)
Sum of virtual cores allocated to the job's running containers.
-
applicationTags(optional):
array applicationTags
A collection of tags or labels associated with this job.
-
applicationType(optional):
string
Allowed Values:
[ "hive", "mapreduce", "spark", "tez", "yarn", "unkown" ]
The kind of application that this job represents. -
completed(optional):
object completed
Read Only:
true
Specifies whether the job has been completed. It can be used to determine whether the client needs to keep polling the -progress- status. -
completedPercentage(optional):
integer(int32)
A value between 0 and 100 that specifies the percentage that the job is complete.
-
connectors(optional):
array connectors
Connecting interfaces that this Job offers. An example of this connector will be the Spark UI.
-
containerLogs(optional):
array containerLogs
Location of log file(s) for a running job.
-
displayName(optional):
string
Read Only:
true
-
elapsedTime(optional):
integer(int64)
Time since the job started (in ms)
-
endTime(optional):
string(date-time)
Read Only:
true
Specifies the time at which the job has ended. This property is available only after a job has ended. -
endTimeMillis(optional):
integer(int64)
Time in which job ended (in ms since epoch)
-
error(optional):
object Error
-
id(optional):
string
Read Only:
true
Unique Job Identifier -
intervalToPoll(optional):
integer(int64)
Specifies the number of milliseconds to wait before rechecking the status of a job.
-
jobType(optional):
string
Read Only:
true
-
links(optional):
array links
Hyperlinks to other associated resources.
-
memorySeconds(optional):
integer(int64)
The amount of memory the job has allocated
-
message(optional):
string
Human-readable message that describes the current processing status.
-
name(optional):
string
Name of the Job, used for visual identification. It is not unique.
-
nameAlias(optional):
string
Read Only:
true
-
progress(optional):
string
Allowed Values:
[ "aborted", "aborting", "accepted", "failed", "paused", "pending", "processing", "succeeded", "undefined" ]
Current progress of the Job. -
queue(optional):
string
Job Queue where this job was scheduled to run.
-
runningContainers(optional):
integer(int32)
Number of containers currently running for the job.
-
sessionType(optional):
string
Read Only:
true
-
snapshotTime(optional):
integer(int64)
-
startTime(optional):
string(date-time)
Read Only:
true
Specifies the time at which the job started. -
startTimeMillis(optional):
integer(int64)
Time in which job started (in ms since epoch)
-
vcoreSeconds(optional):
integer(int64)
The amount of CPU resources the job has allocated
Nested Schema : allocatedOCores
Type:
object
Read Only:
true
Sum of virtual cores allocated to the job???s running containers.
Nested Schema : applicationTags
Type:
array
A collection of tags or labels associated with this job.
Show Source
Nested Schema : completed
Type:
object
Read Only:
true
Specifies whether the job has been completed. It can be used to determine whether the client needs to keep polling the -progress- status.
Nested Schema : connectors
Type:
array
Connecting interfaces that this Job offers. An example of this connector will be the Spark UI.
Show Source
Nested Schema : Error
Type:
Show Source
object
-
detail(optional):
string
detail
-
instance(optional):
string
instance
-
o:errorCode(optional):
string
Read Only:
true
error code -
o:errorDetails(optional):
array o:errorDetails
Read Only:
true
error details -
o:errorPath(optional):
string
Read Only:
true
error path -
status:
integer(int32)
status
-
title:
string
title
-
type(optional):
string
Read Only:
true
RFC Link
Nested Schema : JobConnectorReference
Type:
Show Source
object
-
description:
string
The description of the interface.
-
name:
string
The name of the interface. e.g. Spark UI
-
rel:
string(uri)
Related uri
-
type(optional):
string
Read Only:
true
The media type to apply to the URI -
uris:
array uris
The URI of the linked resource
Nested Schema : JobConnectorResourceIdentifier
Type:
Show Source
object
-
description:
string
The description of the interface.
-
id:
string
The id of the interface. e.g. 1
-
uri:
string(uri)
The URI of the linked resource
Nested Schema : JobContainerLogs
Type:
Show Source
object
-
containerId:
string
Unique Identifier of the Container that is generating the Log.
-
files:
array files
Log files with path relative to the Container.
Nested Schema : ModelLink
Type:
Show Source
object
-
href:
string
The URI of the linked resource
-
mediaType(optional):
string
The media type to apply to the URI
-
method(optional):
string
The method to apply to the URI
-
profile(optional):
string
The profile
-
rel:
string
Relation link
-
templated(optional):
object templated
Is the URI a template
Nested Schema : templated
Type:
object
Is the URI a template
400 Response
List of errors related to the request.
404 Response
The Compute Job was not found.
500 Response
An internal error occurred.
Root Schema : Error
Type:
Show Source
object
-
detail(optional):
string
detail
-
instance(optional):
string
instance
-
o:errorCode(optional):
string
Read Only:
true
error code -
o:errorDetails(optional):
array o:errorDetails
Read Only:
true
error details -
o:errorPath(optional):
string
Read Only:
true
error path -
status:
integer(int32)
status
-
title:
string
title
-
type(optional):
string
Read Only:
true
RFC Link