Valid characters for ingest

A valid character for ingest must be a character according to the XML specification.

See the Second Edition of the XML 1.0 Specification for details about valid characters.

If the Endeca Server detects an invalid character, it rejects the record and returns the following message to Integrator:
Error: Character <c> is not legal in XML 1.0

The error message is added to the log for the run.

Only the record that includes the invalid character is rejected. The rest of the ingest operation continues.

To clean your data, you can add a Reformat component to the graph that includes this component and use the following code:
//#CTL2

// Transforms input record into output record.
function integer transform() {
   string regex = "([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007F]+)";
   $0.YourDataCleanData = replace($YourDatawithInvalidPattern,regex,"");

   return ALL;
}

Compatibility characters are also not valid. The code above removes compatibility characters.