Oracle Commerce Guided Search - Dgraph partial update script using Forge

Dgraph partial update script using Forge

The partial update script defined in the DataIngest.xml document for a Dgraph deployment is included in this section, with numbered steps indicating the actions performed at each point in the script.

<script id="PartialUpdate">
  <bean-shell-script>
    <![CDATA[

Obtain lock. The partial update attempts to set an "update_lock" flag in the EAC to serve as a lock or mutex. If the flag is already set, this step fails, ensuring that the update cannot be started more than once simultaneously, as this would interfere with data processing. The flag is removed in the case of an error or when the script completes successfully.
```
    log.info("Starting partial update script.");
      // obtain lock
      if (LockManager.acquireLock("update_lock")) {
```
Validate data readiness. Test that the EAC contains at least one flag with the prefix "partial_extract::". One of these flags should be created for each successfully and completely extracted file, with the prefix "partial_extract::" prepended to the extracted file name (e.g. "partial_extract::adds.txt.gz"). These flags are deleted during data processing and must be created as new files are extracted.
```
    // test if data is ready for processing
    if (PartialForge.isPartialDataReady()) {
```
Archive partial logs. The logs/partial directory is archived, to create a fresh logging directory for the partial update process and to save the previous run's logs.
```
    // archive logs
    PartialForge.archiveLogDir();
```
Clean processing directories. Files from the previous update are removed from the data/partials/processing, data/partials/forge_output, and data/temp directories.
```
    // clean directories
    PartialForge.cleanDirs();
```
Move data and config to processing directory. Extracted files in data/partials/incoming with matching "partials_extract::" flags in the EAC are moved to data/partials/processing. Configuration files are copied from config/pipeline to data/processing.
```
    // fetch extracted data files to forge input
    PartialForge.getPartialIncomingData();

    // fetch config files to forge input
    PartialForge.getConfig();
```
Forge. The partial update Forge process executes.
```
    // run ITL
    PartialForge.run();
```
Apply timestamp to updates. The output XML file generated by the partial update pipeline is renamed to include a timestamp, to ensure it is processed in the correct order relative to files generated by previous or following partial update processes.
```
    // timestamp partial, save to cumulative partials dir
    PartialForge.timestampPartials();
```
Copy updates to cumulative updates. The timestamped XML file is copied into the cumulative updates directory.
```
    PartialForge.fetchPartialsToCumulativeDir();
```
Distribute update to each server. A single copy of the partial update file is distributed to each server specified in the configuration.
```
    // distribute partial update, update Dgraphs
    DgraphCluster.copyPartialUpdateToDgraphServers();
```
Update MDEX Engines. The Dgraph processes are updated. Engines are updated according to the updateGroup property specified for each Dgraph. The update process for each Dgraph is as follows:
1. Copy update files into the dgraph_input/updates directory.
2. Trigger a configuration update in the Dgraph by calling the URL admin?op=update.
```
    DgraphCluster.applyPartialUpdates();
```
Archive cumulative updates. The newly generated update file (and files generated by all partial updates processed since the last baseline) are archived on the indexing server.
```
    // archive partials
    PartialForge.archiveCumulativePartials();
```

Release Lock. The "update_lock" flag is removed from the EAC, indicating that another update may be started.

    // release lock
    LockManager.releaseLock("update_lock");
    log.info("Partial update script finished.");
      }
      else {
        log.warning("Failed to obtain lock.");
      }
    ]]>
  </bean-shell-script>
</script>

Preventing non-nillable element exceptions

When running the partial updates script, you may see a Java exception similar to this example:

INFO: Starting copy utility 'copy_partial_update_to_host_MDEXHost1'.
Oct 20, 2008 11:46:37 AM org.apache.axis.encoding.ser.BeanSerializer serialize
SEVERE: Exception:
java.io.IOException: Non nillable element 'fromHostID' is null.
...

If this occurs, make sure that the following properties are defined in the AppConfig.xml configuration file:

<dgraph-defaults>
  <properties>
      ...
      <property name="srcPartialsDir" value="./data/partials/forge_output" />
      <property name="srcPartialsHostId" value="ITLHost" />
      <property name="srcCumulativePartialsDir" value="./data/partials/cumulative_partials" />
      <property name="srcCumulativePartialsHostId" value="ITLHost" />
      ...
    </properties>
  ...
</dgraph-defaults>

The reason is that the script is obtaining the fromHostID value from this section.

Dgraph partial update script using Forge

Preventing non-nillable element exceptions

Guided Search Administrator's Guide