A regular expression is a character string in which some characters provide special meaning in regard to matching patterns. This section explains some basic guidelines on how to use regular expressions with the HTTPS.
Regular expressions allow you to specify patterns for file names and directory names. Regular expressions are used for “get” operations (receiving or source), as opposed to name patterns which are used for “put” operations (sending or destination).
The BatchFTP, Batch FTPOverSSL, BatchSFTP, BatchLocalFile, and BatchInbound OTDs allow you to use regular expressions, for example, if you want to access all files with a specific extension.
Regular expressions operate as follows:
The directory/file names can be defined as either:
Actual file names (everywhere)
Name patterns (all names for “put” operations and pre/post transfer names for get operations)
Regular expressions (target names for “get” operations)
The difference between the regular expressions and name patterns is:
Regular expressions are used to match existing names on the FTP server or the local file system.
Name patterns are used to create names by replacing the special characters in the pattern.
For more information on name patterns using special characters, see Using Name Patterns With the Batch Adapter.
You can specify an extension, for example, .*\.dat$. Then, each time the get() method is called, the adapter gets the next file with a .dat extension. The adapter then retrieves each file into the OTD’s Payload node and updates the working file-name attribute with the name of the file currently being accessed.
For another example, you can use the file-matching the pattern data\.00[1-9] to get the files data.001, then data.002, and so on. In each case the “.” is escaped, which is consistent with regular-expression syntax. It also matches to xyzdata.001 and xyz.data.001, because it does not exclude anything before “data”. To make “data” the exact start of the matching pattern you must use ^data\.00[1-9] or \A data\.00[1-9].
The use of regular expressions is an advanced feature and must be implemented carefully. An improperly formed regular expression can cause undesired data or even the loss of data. You must have a clear understanding of regular-expression syntax and construction before attempting to use this feature. It is recommended that you test such configurations thoroughly before moving them to production.
You can enter a regular expression for the FTP or local file name in a variety of ways, for example, .*\.dat$ or ^xyz.*\.dat$. The first case indicates all files with an extension of .dat. The second case indicates all file names with an extension of .dat whose names start with xyz.
Another example could be file[0-9]\.dat. This expression specifies file0.dat, file1.dat, file2.dat, and so on, through file9.dat. This will also match xyz.file0.dat, xyz.file1.dat, and so on. This type of expression will not exclude anything in front of “file”. To exclude any characters before “file” (to make “file” the exact beginning) use ^file[0-9].dat or \Afile[0-9].dat.
These types of regular expression patterns can be used for a get operation.
The adapter provides a File Name Is Pattern or Directory Name Is Pattern configuration parameter after every property that allows a regular expression as an option. This feature allows you to specify that the pattern entered is a regular expression or just a static text entry to be interpreted literally.
Regular expressions will resolve even with a partial match to the file name. The resolution process searches for the file name contents rather than the file name.
There are special considerations you must be aware of when you use regular expressions for directory names. This section describes these restrictions and provides some examples.
The following restrictions apply when using regular expressions as directory names:
The directory root, the drive name, and directory separators must be expressed exclusively. That is, do not express any of these elements as a regular expression. Only folder names are expected to appear as regular expressions.
A regular expression must not span over the directory separators. If you use a regular expression between two directory separators, it must be one whole expression.
Escape all directory separators in a directory pattern if the separator conflicts with a regular expression special character (that is, ” * [ ] ( ) | + { } : . ^ $ ? \"). The back slash (\) is the special character used to escape other special characters in regular expressions. For Windows platforms, the directory separator is the back slash, so it must be escaped as \\.
For the Windows Universal Naming Convention (UNC), the directory root (including the computer name and the shared root folder name) must be expressed exclusively. That is, do not express the computer name and shared root folder as a regular expression.
Directory separators are platform dependent, for example:
For Windows platforms, the directory path follows this pattern:
drive:\\regexp1\\regexp2\\regexp3 ... |
or for Windows UNC notation, the directory path follows this pattern:
\\\\machineName\\shared_folder\\regexp1\\regexp2\\regexp3 ... |
For UNIX platforms, including mounted directories, the directory path follows this pattern:
/regexp1/regexp2/regexp3 ... |
The following are several examples of regular expression directory name usage:
Windows:
c:\\$\\^client\\collab\D\\ ... |
The expression \D indicates any non-digit character.
d:\\a.b\\c.d\\e.f\\g.h\\[0-9]\\ ... |
The symbol “.” means any character
Windows UNC notation:
\\\\My_Machine\\public\\xyz$\\^abc |
The prefix for Windows UNC notation is \\. After escaping, it becomes \\\\.
UNIX:
/abc\d/def/ghi/ ... |
The expression \d means any digit character.
/^PRE[0-9]{5}\.dat$/ ... |
This expression means to begin with PRE followed by a five-digit number and use a .dat extension. The symbol \. means to interpret the real character (a period) instead of any character. Therefore, PRE12345.dat does match, but PRE123456dat does not.