Forge

A Forge element launches the Forge (Data Foundry) software, which transforms source data into tagged Endeca records.

Every Application Controller component contains the following attributes:

Attribute Description
component-id Required. The name of this instance of the component.
host-id Required. The alias of the host upon which the component is running.
properties An optional list of properties, consisting of a required name and an optional value.

The Forge element contains the following sub-elements:

Sub-element Description
args Command-line flags to pass to Forge, expressed as a set of arg sub-elements. If an argument takes a value, the argument and value must be on separate lines in the provisioning file. For example:
<args>
    <arg>--threads</arg>
    <arg>3</arg>
</args>
input-dir The path to the Forge input.
log-file Name of the Forge log file. If the log-file is not specified, the default is component working directory plus component name plus “.log”.
output-prefix-name The implementation-specific prefix name, without any associated path information.
output-dir Directory where the output from the Forge process will be stored.
pipeline-file Required. Name of the Pipeline.epx file to pass to Forge.
num-partitions The number of partitions.
working-dir Working directory for the process that is launched. If it is specified, it must be an absolute path. If any of the other properties of this component contain relative paths, they are interpreted as relative to the working directory. If working-dir is not specified, it defaults to $ENDECA_CONF/work/<appName>/ <componentName> on UNIX, or %ENDECA_CONF%\work\<appName>/ <componentName> on Windows.
state-dir The directory where the state file is located.
temp-dir The temporary directory that Forge uses.
web-service-port The port on which the Forge metrics Web service listens.
ssl-configuration Both the parallel Forge and Forge metrics Web service can secure their communications with SSL. The ssl-configuration element contains three sub-elements of its own:
  • cert-file: The cert-file specifies the path of the eneCert.pem certificate file that is used by Forge processes to present to any client. This is also the certificate that the Application Controller Agent should present to Forge when trying to talk to it. The file name can be a path relative to the component’s working directory.
  • ca-file: The ca-file specifies the path of the eneCA.pem Certificate Authority file that Forge processes uses to authenticate communications with other Endeca components. The file name can be a path relative to the component’s working directory.
  • cipher: The cipher is an optional cipher string (such as RC4-SHA) that specifies the minimum cryptographic algorithm that parallel Forge processes use during the SSL negotiation. If you omit this setting, the SSL software tries an internal list of ciphers, beginning with AES256-SHA. The Forge metrics Web service does not use the cipher sub-element.

Example

The following example provisions a Forge component for use with the sample wine data:

<forge component-id="wine_forge" host-id="wine_indexer">
	<args>
		<arg>-vw</arg>
	</args>
	<num-partitions>1</num-partitions>
	<working-dir>
   C:\Endeca\PlatformServices\reference\sample_wine_data
 </working-dir>
	<pipeline-file>.\data\forge_input\pipeline.epx</pipeline-file>
	<input-dir>.\data\forge_input</input-dir>
	<output-dir>.\data\partition0\forge_output</output-dir>
	<state-dir>.\data\partition0\state</state-dir>
	<log-file>.\logs\wine_forge.log</log-file>
	<output-prefix-name>wine</output-prefix-name>
</forge>