The partial update script defined in the
DataIngest.xml
document for a Dgraph deployment is
included in this section, with numbered steps indicating the actions performed
at each point in the script.
<script id="PartialUpdate"> <bean-shell-script> <![CDATA[
Obtain lock. The partial update attempts to set an "
update_lock
" flag in the EAC to serve as a lock or mutex. If the flag is already set, this step fails, ensuring that the update cannot be started more than once simultaneously, as this would interfere with data processing. The flag is removed in the case of an error or when the script completes successfully.log.info("Starting partial update script."); // obtain lock if (LockManager.acquireLock("update_lock")) {
Validate data readiness. Test that the EAC contains at least one flag with the prefix "
partial_extract::
". One of these flags should be created for each successfully and completely extracted file, with the prefix "partial_extract::
" prepended to the extracted file name (e.g. "partial_extract::adds.txt.gz
"). These flags are deleted during data processing and must be created as new files are extracted.// test if data is ready for processing if (PartialForge.isPartialDataReady()) {
Archive partial logs. The
logs/partial
directory is archived, to create a fresh logging directory for the partial update process and to save the previous run's logs.// archive logs PartialForge.archiveLogDir();
Clean processing directories. Files from the previous update are removed from the
data/partials/processing
,data/partials/forge_output
, anddata/temp
directories.// clean directories PartialForge.cleanDirs();
Move data and config to processing directory. Extracted files in
data/partials/incoming
with matching "partials_extract::
" flags in the EAC are moved todata/partials/processing
. Configuration files are copied fromconfig/pipeline
todata/processing
.// fetch extracted data files to forge input PartialForge.getPartialIncomingData(); // fetch config files to forge input PartialForge.getConfig();
Forge. The partial update Forge process executes.
// run ITL PartialForge.run();
Apply timestamp to updates. The output XML file generated by the partial update pipeline is renamed to include a timestamp, to ensure it is processed in the correct order relative to files generated by previous or following partial update processes.
// timestamp partial, save to cumulative partials dir PartialForge.timestampPartials();
Copy updates to cumulative updates. The timestamped XML file is copied into the cumulative updates directory.
PartialForge.fetchPartialsToCumulativeDir();
Distribute update to each server. A single copy of the partial update file is distributed to each server specified in the configuration.
// distribute partial update, update Dgraphs DgraphCluster.copyPartialUpdateToDgraphServers();
Update MDEX Engines. The Dgraph processes are updated. Engines are updated according to the
updateGroup
property specified for each Dgraph. The update process for each Dgraph is as follows:DgraphCluster.applyPartialUpdates();
Archive cumulative updates. The newly generated update file (and files generated by all partial updates processed since the last baseline) are archived on the indexing server.
// archive partials PartialForge.archiveCumulativePartials();
Release Lock. The "
update_lock
" flag is removed from the EAC, indicating that another update may be started.// release lock LockManager.releaseLock("update_lock"); log.info("Partial update script finished."); } else { log.warning("Failed to obtain lock."); } ]]> </bean-shell-script> </script>
When running the partial updates script, you may see a Java exception similar to this example:
INFO: Starting copy utility 'copy_partial_update_to_host_MDEXHost1'. Oct 20, 2008 11:46:37 AM org.apache.axis.encoding.ser.BeanSerializer serialize SEVERE: Exception: java.io.IOException: Non nillable element 'fromHostID' is null. ...
If this occurs, make sure that the following properties are defined in
the
AppConfig.xml
configuration file:
<dgraph-defaults> <properties> ... <property name="srcPartialsDir" value="./data/partials/forge_output" /> <property name="srcPartialsHostId" value="ITLHost" /> <property name="srcCumulativePartialsDir" value="./data/partials/cumulative_partials" /> <property name="srcCumulativePartialsHostId" value="ITLHost" /> ... </properties> ... </dgraph-defaults>
The reason is that the script is obtaining the
fromHostID
value from this section.