Troubleshooting Data Corruption

Oracle Responsys tracks many user-initiated behavioral events (e.g. opens and clicks) via URLs sent back to Oracle Responsys. These URLs can become corrupted before Oracle Responsys receives them, and the corrupted data is saved as-is into Oracle Responsys event logs. The URL corruption might occur for a variety of reasons that Oracle Responsys did not cause nor is able to fix.

If your CED output files contain corrupted data values, these values might cause errors in your data processes that ingest the CED output. The most common corruption occurs when data values in the CUSTOMER_ID field include non-alpha-numeric characters (such as &, @, *). We recommend you always pre-process the CED output files, identify all invalid rows, then direct valid rows to a “.valid” file, and direct the invalid rows to an “.invalid” file. You can then use the “.valid” file as a source for further processing.

To identify and redirect invalid rows, you usually need to perform a global search of the incoming file using a regular expression, then print the output. For Unix/Linux operation systems, use the awk command-line utility to search plain-text data sets for lines matching a regular expression. This utility is located in /bin/awk.

Although Oracle Responsys does not cause the data corruption and cannot fix it, we provide a solution for identifying the problem and redirecting valid and invalid rows.

The examples in this Appendix provides the following solutions.

Example 1: Identify invalid rows in the target file and do not redirect either valid or invalid rows.

Example 2: Redirect valid rows from source file to the “.valid” file and do nothing with invalid rows.

Example 3: Redirect invalid rows to the “.invalid” file and valid rows to “.valid” file. Use this example if the CUSTOMER_ID column contains alphanumeric characters.

Example 4: Redirect invalid rows to “.invalid” file and valid rows to “.valid” file. Use this example if the CUSTOMER_ID column contains only numeric characters.

All examples assume that:

  • CUSTOMER_ID is the fifth column in the CED file

  • The file uses the comma (,) as a delimiter
    NOTE: If your file uses a different separator, replace the comma (in the FC variable) with the character your file uses. For example, if your file uses the semicolon (;) specify FS="\";\"".

 

Example 1

This example checks whether any row contains non-alphanumeric characters and prints the invalid rows.

 

Example: – vi filter.awk
BEGIN {
# Define the field separator
# 
# awk reads and parses each line from the input file using the built-in input field separator variable FC
#
# The CED file uses the comma (,) as the separator
# if your file uses a different separator, replace the comma with the separator you use 
# Since quotes(") have special meaning to the shell, we escape them with \
FS="\",\""
 
}
{
# Validate whether column 5 in the incoming row contains non alpha-numeric data
 if ( $5 !~ /^[0-9a-zA-Z]*$/ ) {
 
#Print the invalid rows
 print $5 ; 
 }
}

 

Usage: – awk –f filter.awk ced_click_file.csv 
Output: Rows that contain non-alphanumeric data in the fifth column. 

Example 2

This example directs valid rows that contain only alphanumeric characters to the “.valid” file, and does nothing with invalid rows.

Example: – In filter.awk 
BEGIN {
 FS="\",\""
}
 
# Clean up the existing .valid file 
# 
# If this is the first row in the source file (FNR == 1), remove the “.valid” file
{
if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid") 
                        print $0 >> FILENAME".valid"}
 else
# Validate whether incoming rows contain only alpha-numeric data 
if ( $5 ~ /^[0-9a-zA-Z]*$/ ) {
 
# Redirect valid rows to the .valid file
 print $0 >> FILENAME".valid" ;
 }
}

 

Usage – awk –f filter.awk ced_click_file.csv

 

Output: The source file is preserved and valid rows are directed to the “.valid” file.

 

Example 3

This example directs invalid rows that contain non-alphanumeric characters to the “.invalid” file and directs valid rows to the “.valid” file.

 

Example: – In filter.awk 
BEGIN {
 FS="\",\""
}
 
# Clean up the existing “.valid file” and “.invalid” files
# 
# If this is the first row in the source file (FNR == 1)
# remove the “.valid” and “.invalid” files
{
if (FNR == 1) { system("rm -f "FILENAME".invalid") && system("touch "FILENAME".invalid") }
if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid")
   print $0 >> FILENAME".valid"
 }
 
# Validate whether incoming rows contain only alpha-numeric data 
if (FNR > 1) { if ( $5 ~ /^[0-9a-zA-Z]*$/ ) {
 
# Redirect valid rows to the .valid file
print $0 >> FILENAME".valid" ;
               } else
               {
# Redirect invalid rows to the .invalid file
                 print $0 >> FILENAME".invalid" ;
               }
             }
}

 

Usage: – awk –f filter.awk ced_click_file.csv

 

Output: The source file is preserved, valid rows are directed to a “.valid” file, and invalid rows are directed to the “.invalid” file.

 

Example 4

This example directs valid rows that contain only numeric characters to the “.valid” file and directs invalid rows to the “.invalid” file.

 

Example: – In filter.awk 
BEGIN {
 FS="\",\""
 
}
{
if (FNR == 1) { system("rm -f "FILENAME".invalid") && system("touch "FILENAME".invalid") }
if (FNR == 1) { system("rm -f "FILENAME".valid") && system("touch "FILENAME".valid")
   print $0 >> FILENAME".valid"
  }
 
# Validate whether incoming rows contain only numeric data 
 if (FNR > 1) { if ( $5 ~ /^[0-9]*$/ ) {
                  print $0 >> FILENAME".valid" ;
               } else
               {
                  print $0 >> FILENAME".invalid" ;
               }
             }
}

 

Usage – awk –f filter.awk ced_click_file.csv 
 
Output: The source file is preserved, valid rows are directed to a “.valid” file, and invalid rows are directed to the “.invalid” file.

Learn more

Configuring a Contact Event Data export job

About Contact Event Data output files