Troubleshooting Data Corruption

Oracle Responsys tracks many user-initiated behavioral events (e.g. opens and clicks) via URLs sent back to Oracle Responsys. These URLs can become corrupted before Oracle Responsys receives them, and the corrupted data is saved as-is into Oracle Responsys event logs. The URL corruption might occur for a variety of reasons that Oracle Responsys did not cause nor is able to fix.

If your CED output files contain corrupted data values, these values might cause errors in your data processes that ingest the CED output. The most common corruption occurs when data values in the CUSTOMER_ID field include non-alpha-numeric characters (such as &, @, *). We recommend you always pre-process the CED output files, identify all invalid rows, then direct valid rows to a “.valid” file, and direct the invalid rows to an “.invalid” file. You can then use the “.valid” file as a source for further processing.

To identify and redirect invalid rows, you usually need to perform a global search of the incoming file using a regular expression, then print the output. For Unix/Linux operation systems, use the awk command-line utility to search plain-text data sets for lines matching a regular expression. This utility is located in /bin/awk.

Although Oracle Responsys does not cause the data corruption and cannot fix it, we provide a solution for identifying the problem and redirecting valid and invalid rows.

The examples in this Appendix provides the following solutions.

Example 1: Identify invalid rows in the target file and do not redirect either valid or invalid rows.

Example 2: Redirect valid rows from source file to the “.valid” file and do nothing with invalid rows.

Example 3: Redirect invalid rows to the “.invalid” file and valid rows to “.valid” file. Use this example if the CUSTOMER_ID column contains alphanumeric characters.

Example 4: Redirect invalid rows to “.invalid” file and valid rows to “.valid” file. Use this example if the CUSTOMER_ID column contains only numeric characters.

All examples assume that:

CUSTOMER_ID is the fifth column in the CED file
The file uses the comma (,) as a delimiter
NOTE: If your file uses a different separator, replace the comma (in the FC variable) with the character your file uses. For example, if your file uses the semicolon (;) specify FS="\";\"".

Example 1

This example checks whether any row contains non-alphanumeric characters and prints the invalid rows.

Example: – vi filter.awk

BEGIN {

# Define the field separator

# awk reads and parses each line from the input file using the built-in input field separator variable FC

# The CED file uses the comma (,) as the separator

# if your file uses a different separator, replace the comma with the separator you use

# Since quotes(") have special meaning to the shell, we escape them with \

FS="\",\""

# Validate whether column 5 in the incoming row contains non alpha-numeric data

 if ( $5 !~ /^[0-9a-zA-Z]*$/ ) {

#Print the invalid rows

 print $5 ;

Usage: – awk –f filter.awk ced_click_file.csv 
Output: Rows that contain non-alphanumeric data in the fifth column.

Example 2

This example directs valid rows that contain only alphanumeric characters to the “.valid” file, and does nothing with invalid rows.

Example: – In filter.awk

BEGIN {

 FS="\",\""

# Clean up the existing .valid file

# If this is the first row in the source file (FNR == 1), remove the “.valid” file

if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid")

                        print $0 >> FILENAME".valid"}

 else

# Validate whether incoming rows contain only alpha-numeric data

if ( $5 ~ /^[0-9a-zA-Z]*$/ ) {

# Redirect valid rows to the .valid file

 print $0 >> FILENAME".valid" ;

Usage – awk –f filter.awk ced_click_file.csv

Output: The source file is preserved and valid rows are directed to the “.valid” file.

Example 3

This example directs invalid rows that contain non-alphanumeric characters to the “.invalid” file and directs valid rows to the “.valid” file.

Example: – In filter.awk

BEGIN {

 FS="\",\""

# Clean up the existing “.valid file” and “.invalid” files

# If this is the first row in the source file (FNR == 1)

# remove the “.valid” and “.invalid” files

if (FNR == 1) { system("rm -f "FILENAME".invalid") && system("touch "FILENAME".invalid") }

if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid")

   print $0 >> FILENAME".valid"

# Validate whether incoming rows contain only alpha-numeric data

if (FNR > 1) { if ( $5 ~ /^[0-9a-zA-Z]*$/ ) {

# Redirect valid rows to the .valid file

print $0 >> FILENAME".valid" ;

               } else

# Redirect invalid rows to the .invalid file

                 print $0 >> FILENAME".invalid" ;

Usage: – awk –f filter.awk ced_click_file.csv

Output: The source file is preserved, valid rows are directed to a “.valid” file, and invalid rows are directed to the “.invalid” file.

Example 4

This example directs valid rows that contain only numeric characters to the “.valid” file and directs invalid rows to the “.invalid” file.

Example: – In filter.awk

BEGIN {

 FS="\",\""

if (FNR == 1) { system("rm -f "FILENAME".invalid") && system("touch "FILENAME".invalid") }

if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid")

   print $0 >> FILENAME".valid"

# Validate whether incoming rows contain only numeric data

 if (FNR > 1) { if ( $5 ~ /^[0-9]*$/ ) {

                  print $0 >> FILENAME".valid" ;

               } else

                  print $0 >> FILENAME".invalid" ;

Usage – awk –f filter.awk ced_click_file.csv 
 
Output: The source file is preserved, valid rows are directed to a “.valid” file, and invalid rows are directed to the “.invalid” file.

Learn more

Configuring a Contact Event Data export job

About Contact Event Data output files