Endeca Control System directory structure

Before you start building your instance configuration, you must create a directory structure to support your data processing back end. The structure of the directory is dictated by the mechanism (i.e., Endeca Control System or the Endeca Application Controller) you have chosen to control your Endeca environment.

If you are using the Endeca Control System to control your environment, you will have to create a directory structure to contain source data, control scripts, system-generated files, log files, and so forth. The example below shows the directory structure used for the sample_wine_data reference implementation:
instance_root
	data
		forge_input
		incoming
		partition0
			dgidx_output
			dgraph_input
			forge_output
			state
	etc
	logs
	reports

The table below describes the contents of each directory:

Directory

Description

instance_root

Contains all required subdirectories for this instance of your Endeca implementation.

data

Contains subdirectories for your instance configuration, source data extracts, and system-generated files.

forge_input

Contains the baseline pipeline file (typically named pipeline.epx), the partial updates pipeline file (if you are running partial updates; the file is typically named partial_pipeline.epx), and the index configuration files (*.xml).

incoming

Contains data ready for processing by Forge. On a production site, the files in this directory may have been created by a data extraction process on the customer’s database or may be picked up from another FTP server.

partition

Contains subdirectories for system-generated files, such as Forge output, Dgidx output, and Dgraph input.

state

Contains any state information that must be saved between runs of the Data Foundry, for example, auto-generated dimension IDs.

forge_output

Contains data that has been processed by Forge and is ready for indexing.

dgidx_output

Contains indices that have been processed by Dgidx and output in MDEX Engine format.

dgraph_input

Contains a copy of the MDEX Engine indices stored in dgidx_output. When you start the MDEX Engine (Dgraph) process, you should point at this copy of the indices. Having a separate copy of the indices allows you to isolate your working MDEX Engine indices from those that are being updated.

etc

Contains system-level configuration for your Endeca implementation, such as control scripts.

logs

Contains log files generated by the various Endeca components.

reports

Contains any reports you choose to generate for your implementation.

While you can structure your directories in any way you want, Oracle recommends you mimic the directory structure of the sample_wine_data reference implementation in order to maximize reuse of code, configuration settings, and control scripts.

After creating your directory structure, you should:
  • Copy your source data extracts to instance_root/data/incoming.
  • Copy any control scripts you want to use or modify to the etc directory. You can find reference control scripts in the etc directory of the sample_wine_data reference implementation.