Fetch brick

A Fetch brick is used to retrieve raw data for processing. You must use a separate Fetch brick for each raw data source.

Fetch brick settings

Setting

Description

source

A URL specifying where and how to retrieve the data. Protocols understood are file, HTTP, and FTP. Secure protocols understood are files and HTTPS.

The file and files protocols either move or copy files, depending on the value of remove_source.

File protocol paths can be relative (for example, file:///foo) or absolute (for example, file:////foo).

The file and files protocols support file retrieval from remote machines via the Endeca JCD (see “Fetching files from remote machines” below).

For the file, files, and FTP protocols, if the source contains wildcards like “*” or “?”, then all files matching that pattern are retrieved. In this case, the dest setting must name a directory.

Note: Due to the nature of their content, HTTP and HTTPS URLs cannot use wildcards.

username

Required for FTP URLs. Optional for HTTP and HTTPS URLs. Not used for file and files URLs.

password

Required for FTP URLs. Optional for HTTP and HTTPS URLs. Not used for file and files URLs.

dest

The file or directory in which the fetched files should be stored. If dest names a directory, the directory must already exist.

The dest setting for file, files, and FTP URLs that use wildcards must be a directory. For any URL that does not use wildcards, the dest setting must be equivalent to the source; in other words, if the source specifies a directory, the dest must also be a directory. If the source specifies a file, the dest must also be a file.

remove_source

Specifying this Boolean setting deletes the source after it has been fully and successfully downloaded. remove_source is only supported for removing files from the local machine. It is not supported for HTTP, HTTPS, or FTP URLs.

recursive

If this Boolean setting is specified, directories are downloaded recursively. This setting is only supported for file and files URLs, both local and remote.

stdout

Where to redirect stdout for the brick. By default, stdout is sent to the screen. Specifying a value for stdout overrides the stdout_base setting.

stderr

Where to redirect stderr for the brick. By default, stderr is sent to the screen. Specifying a value for stderr overrides the stderr_base setting.

Fetching files from remote machines

Fetch bricks support file retrieval from remote machines via the Endeca JCD. In order to use this functionality, however, you must set certain brick settings correctly:

Fetch brick settings:
  • source must specify a remote file URL. If the Endeca JCD on the remote machine is configured to use SSL, you must use the files protocol.
  • dest must specify either a filename (for retrieving a single file) or a directory name (if you are using wildcards in your source setting).
Default settings:
  • endeca_bin must be set to the pathname of the bin directory in the Endeca Platform Services installation you are using.
  • jcd_port must be set to the Endeca JCD port on the remote machine.
  • perl_binary specifies which Perl interpreter to use on the destination machine. (The Endeca software includes and requires version 5.8.3 of Perl.)
  • sslcertfile specifies the path to the eneCert.pem certificate file on the destination machine.

The source, dest, endeca_bin, and jcd_port settings are required. The sslcertfile setting is required if the Endeca JCD is configured to use SSL. The perl_binary setting is optional but highly recommended.

Note: While it is not required, Oracle strongly recommends that you specify endeca_bin, jcd_port, perl_binary and sslcertfile as global default settings.
The following is a UNIX example of a Fetch brick that uses the FTP protocol:
fetch_data : Fetch
	source = ftp://ftp.example.com/ourdata.zip
	username = endeca
	password = endeca
	dest = /raw_data/ourdata.zip

The following is a UNIX example of a Fetch brick that fetches data from a remote machine, using the Endeca JCD in a secure environment:

Note: This brick example assumes that endeca_bin, jcd_port, perl_binary, and sslcertfile have been set globally.
fetch_remote_data : Fetch
	source = \
			files://idx01/raw_data/ourdata.zip
	dest = /endeca/current/raw_data/ourdata.zip