To locate your application's Dgidx logs, consult the Dgidx
definition in the
AppConfig.xml
file of the Deployment Template.
By default, if your application
name is
MyApp
and your Dgidx process name is
Dgidx1
, the Dgidx logs are located in
MyApp/logs/dgidxs/Dgidx1.
For example, the following Dgidx definition from the
AppConfig.xml
lists the location of the Dgidx logs:
# Dgidx
#
-->
<dgidx id="Dgidx1" host-id="ITLHost">
<properties>
...
</properties>
<directories>
<directory name="incomingDataDir">./data/forge_output</directory>
<directory name="configDir">./data/forge_output</directory>
</directories>
<args>
<arg>-v</arg>
</args>
<log-dir>./logs/dgidxs/Dgidx1</log-dir>
<input-dir>./data/dgidxs/Dgidx1/dgidx_input</input-dir>
<output-dir>./data/dgidxs/Dgidx1/dgidx_output</output-dir>
<data-prefix>Test-part0</data-prefix>
<temp-dir>./data/dgidxs/Dgidx1/temp</temp-dir>
<run-aspell>true</run-aspell>
</dgidx>
The following examples list some of the typical items in a Dgidx log file and explain them:
Note
You may notice that Dgidx also creates three properties of type
admin
on each record, named
Endeca.DataSize
,
Endeca.NumAssigns
, and
Endeca.NumWords
. These properties are visible in the
Dgidx log and in the key properties in the Dgraph. Because these properties may
not be supported in future releases, Oracle recommends that you ignore these
properties in the log and avoid building front-end application logic around
them.
Example 10. Example 1
=== DGIDX: Finished phase "Read raw dimensions, properties, and records" === Phase Time: 19 minutes, 44.11 seconds
This log entry indicates that the Dgidx is reading in all data and creating all indexes.
Example 11. Example 2
$->tail Dgidx.log Sorting... 22.16 seconds Writing cycle 255 to temporary file Parsing text fields... ... 179,600,000 text fields, 5,726,985,428 elements 179,700,000 text fields, 5,730,167,391 elements ...
This log entry indicates the following:
text fields
. Text fields are individual entries in a record that Dgidx adds to its index for dimension search, record search, or both. The Dgidx output log lists the total number of text fields in each dimension or property of the record, then periodically outputs how many text fields have been processed during text search indexing.Because text search indexing operates on text fields from all dimensions or properties, the totals printed periodically can be greater than the totals from each.
Text fields contain one or more terms. The large difference between the number of text fields and the number of elements is due to records containing large numbers of terms per property and/or dimension.
elements
. Elements represent the number of individual terms or term-related objects sent to the index.Elements are sorted and stored for text search (including dimension search, if applicable).
For example, consider an employee record:
Name: John Lee Age: 24 Hired: 2008-08-14 Description:
Permanent
. If
Name
is a dimension enabled for dimension search, and
Description
is a property enabled for text search,
then Dgidx would represent this record in the log as having 2 text fields and 3
elements.
Note
These numbers are approximate and reflect on the magnitude of items in the index. Do not interpret these numbers as the exact number of unique terms in the data corpus. Among other considerations, a single input word generates multiple index elements, and for different types of its indexes Dgidx uses different types of unique elements.
You can examine the text search indexing portion of your Dgidx logs and use the information in this topic to identify which items in the log contribute to indexing time.
Stemming and spelling do not affect the log numbers in the text search indexing portion of Dgidx logs. However, wildcard search increases the number of entries made to the index.
The following items related to text search indexing appear in the Dgidx log:
Dgidx log item |
Description |
---|---|
|
Corresponds to the actual number of
records and dimensions listed in the
|
|
Corresponds to the total number of pairs that are available for text search. (Pairs are associations between a dimension or property and their corresponding values.) |
|
Corresponds to the total number of entries that were made to the index.
NoteIf wildcard search is enabled, this increases the
number for
|
|
Reflects the standard index. |
|
Reflects the wildcard index that is created in addition to the standard index. |
When Dgidx processes records with missing or duplicate record specifier (or spec) values, it completes successfully, but produces a very large log file.
The log contains WARN-level messages that print entire records. These warning messages appear in the Dgidx log because records are improperly assigned property values from the project's configured record spec property.
If the application has a
record spec property defined, each record must contain a single unique value
from that property. If a record contains no record spec property value, Dgidx
prints the
"record... has no value assigned to it from any record
specifier property"
warning, as in the following example:
WARN 08/16/09 15:49:23.897 UTC DGIDX {dgidx,baseline}: The record with the following properties has no value assigned to it from any record specifier property. This record cannot be modified with rapid updates: [Record Id=4] Dimension[6200,"Wine Type"]: Value[8013] "White" Dimension[8,"Region"]: Value[4294967254] "Mendocino Lake" [...] Property["P_Body"]: Value[0xce1bd0] "Ripe" Property["P_DateReviewed"]: Value[0xce1470] "02/28/95" [...]
If a record contains a
record spec property value that is already in use by another record (a
non-unique value), Dgidx prints the
"Two records cannot share the value... for specifier
property"
warning, as in the following example:
WARN 08/16/09 15:49:23.897 UTC DGIDX {dgidx,baseline}: Two records cannot share the value "34699" for specifier property "P_WineID"; removing this record: [Record Id=2] Dimension[6200,"Wine Type"]: Value[8013] "White" Dimension[8,"Region"]: Value[4294967282] "Sonoma" [...] Property["P_Body"]: Value[0xce1870] "Crisp" Property["P_WineID"]: Value[0xcd6e40] "34699" [...]
In either case, Dgidx prints the record's property and dimension values into the log so that it can be identified and corrected in a future update.
There is no way to suppress this display of the full record in the cases mentioned above. Instead, you should correct the record spec problems noted in the log by modifying the project's Forge pipeline or its record spec property selection. Assign record spec property values to records that lack them, and ensure that each record is assigned a unique record spec value so that duplicates do not occur. You can use the record details printed into the Dgidx log to identify the affected records even if they do not have unique record spec property values.
When you analyze Dgidx logs, you may notice that periodically indexing times are longer than you might expect.
The indexing operation may appear to you like your normal addition of records, and not a major or minor shift in the character of the existing records that would explain the change in indexing time.
Dgidx periodically goes through a merging process of many indexing generations. In particular, when the number of generation files becomes large, Dgidx merges them together to reduce the number of open files. Dgidx does this extra merge step when the number of generation files exceeds 200.