Forge flag options reference

The included table lists the different flag options that Forge takes.

The usage of Forge is as follows:

forge [-bcdinov] [--options] <Pipeline-XML-File>

<Pipeline-XML-File> can be a relative path or use the file://[hostname]/ protocol.

Forge takes the following options:

Important: All flags are case-sensitive.

Option

Description

-b <cache-num>

Specify the maximum number of records that the record caches should buffer. This may be set individually in the Maximum Records field of the Record Cache editor in Developer Studio.

-c <name=value>

Forge has a set of XML entity definitions whose values can be overridden at the command line, such as current_date, current_time, and end_of_line. You can specify a replacement string for the default entity values using the -c option, or in an .ini file specified with -i (described below).

The format is:

<configValName=configVal>

For example:

end_of_line=”\n”

which would be specified on the command line with:

-c end_of_line=”\n”

or included as a line in an .ini file specified with -i.

This allows you to assign pipeline values to Forge at the command line. In the above example, you would specify &end_of_line; in your pipeline file instead of hard-coding “\n”, then invoke Forge with the -c option shown above. Forge would substitute “\n” whenever it encountered &end_of_line;.

For a complete list of entities and their default values, see the ENTITY definitions in Endeca_Root/conf/dtd/common.dtd.

-d <dtd-path>

Specify the directory containing DTDs (overrides the DOCTYPE directive in XML).

-i <ini-filename>

Specify an .ini file that contains XML entity string replacements. Each line must be in this form:

<configValName=configVal>

See the description of the -c option for details.

-n <parse-num>

Specify the number of records to pull through the pipeline. This option is ignored by the record cache component.

-o <filename>

Specify an output file for messages.

-v[f|e|w|i|d]

Set the global log level. See --logLevel for corresponding information.

If the -v option is omitted, the global log level defaults to d (DEBUG) or the value set in the EDF_LOG_LEVEL environment variable. If the -v option is used without a level, it defaults to d (DEBUG).

f = FATAL messages only.

e = ERROR and FATAL messages.

w = WARNING, ERROR, and FATAL messages.

i = INFO, WARNING, ERROR, and FATAL messages.

d = DEBUG, INFO, WARNING, ERROR, and FATAL messages.

Note: Options -v[a|q|s|t|v] have been deprecated.

--client <server:port>

Run as a client and connect to a Forge server in a Parallel Forge environment.

--clientNum <num>

Direct a Forge server to use <num> instead of assigning a client number. Useful when the client number must remain consistent (that is, it must start from zero and be sequential for all clients). Requires the --client option.

--combineWarnCount <num>

Specify the number of records that can be combined (via a Combine join or a record cache with the Combine Records setting enabled) before issuing a warning that performance may be slow. The default is 100, while 0 will disable the warnings.

--compression <num> | off

Instruct Forge to compress the output to a level of <num>, which is 0 to 9 (where 0 = minimum, 9 = maximum). Specify off to turn off compression.

--connectRetries <num>

Specify the number of retries (-1 to 100) when connecting to the server. The default is 12 while -1 = retry forever. Requires the --client option.

--encryptKey [user:]<password>

Deprecated. Encrypt a key pair so that only Forge can read it.

--help [option]

Print full help if used with no options. Prints specific help with these options (option names and arguments are case sensitive):
  • expression = Prints help on expression syntax.
  • expression:TYPE = Prints help on the syntax for a specific expression type, which can be DVAL, FLOAT, INTEGER, PROPERTY, STREAM, STRING, or VOID.
  • config = Prints help on configuration options.

--idxCompression [<num> | off]

Set the compression of the IndexerAdapter output Forge to a level of <num>, which is 0 to 9 (where 0 = minimum, 9 = maximum). Specify off to turn off compression.

--ignoreState

Instruct Forge to ignore any state files on startup. The state files are ignored only during the startup process. After start up, Forge creates state files during an update and overwrites the existing state files.

--indexConfigDir <path>

Instruct Forge to copy index configuration files from the specified directory to its output directory.

--inputDir <path>

Instruct Forge to load input data from this directory.<path> must be an absolute path and will be used as a base path for the pipeline. Any relative paths in the pipeline will be relative to this base path.

Note: If the pipeline uses absolute paths, Forge ignores this flag.

--input-encoding <encoding>

Deprecated. Specify the encoding of non-XML input files.

--javaArgument <java_arg>

Prepend the given Java option to the Java command line used to start a Java virtual machine (JVM).

--javaClasspath <classpath>

Override the value of the Class path field on the General tab of the Record adapter, if one is specified.

If the Record adapter has a Format setting with JDBC selected, then Class path indicates the JDBC driver.

If the Record adapter has a Format setting with Java Adapter selected, then Class path indicates the absolute path to the custom record adapter’s .jar file.

--javaHome <java_home>

Specifies the location of the Java runtime engine (JRE). This option overrides the value of the Java home field on the General tab of a Record adapter, if one is specified.

The --javaHome setting requires Java 2 Platform Standard Edition 5.0 (aka JDK 1.5.0) or later.

--logDir <path>

Instructs Forge to write logs to this directory, overriding any directories specified in the pipeline.

--logLevel (<topicName> =) <logLevel>

Set the global log level and/or topic-specific log level.

If this option is omitted, the value defaults to INFO or to that set in the EDF_LOG_LEVEL environment variable.

For corresponding information, see the -v option.

Possible log levels are:
  • FATAL = FATAL messages only.
  • ERROR = ERROR and FATAL messages.
  • WARN = WARN, ERROR, and FATAL messages.
  • INFO = INFO, WARN, ERROR, and FATAL messages.
  • DEBUG = DEBUG, INFO, WARN, ERROR, and FATAL messages.
Possible topics for Forge are:
  • baseline
  • update
  • config
  • webservice
  • metrics

--noAutoGen

Do not generate new dimension value IDs (for incremental updates when batch processing is running).

--numClients <num>

The number of Parallel Forge clients connecting. Required with --server option.

--numPartitions <num>

Specify the number of Dgidx instances available to Forge. This number corresponds to the number of Dgraphs, which in turn corresponds to the number of file sets Forge creates.

This option overrides the value of the NUM_IDX attribute in the ROLLOVER element of your project’s Pipeline.epx file, if one is specified.

--outputDir <path>

Instruct Forge to save output data to this directory, overriding any directories specified in the pipeline.

--outputPrefix <prefix>

Override the value specified in Output prefix field of the Indexer Adapter or Update Adapter editors in your Developer Studio pipeline.

--perllib <dir>

Add <dir> to perl’s library path. May be repeated.

--pidfile <pidfile-path>

File in which to store process ID (PID).

--printRecords [number]

Print records as they are produced by each pipeline component. If number is specified, start printing after that number of records have been processed.

--pruneAutoGen

Instructs Forge to remove from the AutoGen state any dimensions that have been promoted as internal dimensions. When a pipeline developer promotes a dimension that was automatically generated, the dimension is copied into the dimensions.xml file and is removed from the AutoGen state file.

--retryInterval <num>

Specify the number of seconds (0 to 60) to sleep between connection attempts. The default is 5. Requires the --client option.

--server <portNum>

Run as a server and listen on port specified Requires the --numClients option.

--spiderThrottle <wait>:<expression_type> :<expression>

Deprecated. During a crawl, throttle the rate at which URLs are fetched by the spider, where:

<wait> is the fetch interval in seconds.

<expression_type> specifies the type of regular or host expression to use:
  • url-regex
  • url-wildcard
  • host-regex
  • host-wildcard

<expression> is the corresponding expression.

Example:

--spiderThrottle 10:url-wildcard:*.html

This would make all URLs that match the wildcard “*.html” wait 10 seconds between fetches.

--sslcafile <CAcertfile-path>

Specify the path of the eneCA.pem Certificate Authority file that the Forge server and Forge clients will use to authenticate each other.

--sslcertfile <certfile-path>

Specify the path of the eneCert.pem certificate file that will be used by the Forge server and Forge client for SSL communications.

--sslcipher <cipher>

Set a cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm the Forge server/client will use during the SSL negotiation.

Note: This setting is ignored by the --wsport flag, even when it uses SSL to secure its communications.

--stateDir <path>

Instruct Forge to persist data in this directory, overriding any directories specified in the pipeline.

--tmpDir <path>

Instruct Forge to write temporary files in the specified directory, overriding any directories specified by environment variables. The <path> value is interpreted as being based in Forge’s working directory, not in the directory containing Pipeline.epx.

--time <comp>

Timing statistics (comp = time each component).

--timeout <num>

Specify the number of seconds (from -1 to 300) that the server waits for clients to connect. Default is 60 and -1 means wait forever. Requires the --server option.

--version

Print out the current version information.

--wsport <portNum>

Start the Forge Metrics Web service, which is off by default. It listens on the port specified.