A Fetch brick is used to retrieve raw data for processing. You must use a separate Fetch brick for each raw data source.
Setting |
Description |
---|---|
source |
A URL specifying where and how to retrieve the data. Protocols understood are file, HTTP, and FTP. Secure protocols understood are files and HTTPS. The file and files protocols either move or copy files, depending on the value of remove_source. File protocol paths can be relative (for example, file:///foo) or absolute (for example, file:////foo). The file and files protocols support file retrieval from remote machines via the Endeca JCD (see “Fetching files from remote machines” below). For the file, files, and FTP protocols, if the source contains wildcards like “*” or “?”, then all files matching that pattern are retrieved. In this case, the dest setting must name a directory. Note: Due to the nature of their content, HTTP and HTTPS URLs cannot use wildcards.
|
username |
Required for FTP URLs. Optional for HTTP and HTTPS URLs. Not used for file and files URLs. |
password |
Required for FTP URLs. Optional for HTTP and HTTPS URLs. Not used for file and files URLs. |
dest |
The file or directory in which the fetched files should be stored. If dest names a directory, the directory must already exist. The dest setting for file, files, and FTP URLs that use wildcards must be a directory. For any URL that does not use wildcards, the dest setting must be equivalent to the source; in other words, if the source specifies a directory, the dest must also be a directory. If the source specifies a file, the dest must also be a file. |
remove_source |
Specifying this Boolean setting deletes the source after it has been fully and successfully downloaded. remove_source is only supported for removing files from the local machine. It is not supported for HTTP, HTTPS, or FTP URLs. |
recursive |
If this Boolean setting is specified, directories are downloaded recursively. This setting is only supported for file and files URLs, both local and remote. |
stdout |
Where to redirect stdout for the brick. By default, stdout is sent to the screen. Specifying a value for stdout overrides the stdout_base setting. |
stderr |
Where to redirect stderr for the brick. By default, stderr is sent to the screen. Specifying a value for stderr overrides the stderr_base setting. |
Fetch bricks support file retrieval from remote machines via the Endeca JCD. In order to use this functionality, however, you must set certain brick settings correctly:
The source, dest, endeca_bin, and jcd_port settings are required. The sslcertfile setting is required if the Endeca JCD is configured to use SSL. The perl_binary setting is optional but highly recommended.
fetch_data : Fetch source = ftp://ftp.example.com/ourdata.zip username = endeca password = endeca dest = /raw_data/ourdata.zip
The following is a UNIX example of a Fetch brick that fetches data from a remote machine, using the Endeca JCD in a secure environment:
fetch_remote_data : Fetch source = \ files://idx01/raw_data/ourdata.zip dest = /endeca/current/raw_data/ourdata.zip