| Oracle9i Database Globalization Support Guide Release 2 (9.2) Part Number A96529-01 | 
 | 
This chapter introduces the Character Set Scanner, a globalization support utility for checking data before migrating character sets. The topics in this chapter include:
The Character Set Scanner provides an assessment of the feasibility and potential issues in migrating an Oracle database to a new database character set. The Character Set Scanner checks all character data in the database and tests for the effects and problems of changing the character set encoding. At the end of the scan, it generates a summary report of the database scan. This report shows the scope work required to convert the database to a new character set.
Based on the information in the summary report, you can decide on the most appropriate method to migrate the database's character set. The methods are:
ALTER DATABASE CHARACTER SET statementALTER DATABASE CHARACTER SET statement with selective Export and Import 
The Character Set Scanner reads the character data and tests for the following conditions on each data cell:
The Character Set Scanner reads and tests for data in CHAR, VARCHAR2, LONG, CLOB, NCHAR, NVARCHAR2, and NCLOB columns only. The Character Set Scanner does not perform post-conversion column size testing for LONG, CLOB, and NCLOB columns.
To use the Character Set Scanner, you must have DBA privileges on the Oracle database.
All the character-based data in CHAR, VARCHAR2, LONG, and CLOB columns is stored in the database character set, which is specified with the CREATE DATABASE statement when the database was first created. However, in some configurations, it is possible to store data in a different character set from the database character set either intentionally or unintentionally. This happens most often when the NLS_LANG character set is the same as the database character set, because in such cases Oracle sends and receives data as is, without any conversion or validation. But it can also happen if one of the two character sets is a superset of the other, in which case many of the code points appear as if they were not converted. For example, if NLS_LANG is set to WE8ISO8859P1 and the database character set is WE8MSWIN1252, all code points except the range 128-159 are preserved through the client/server conversion.
Although a database that contains data not in its database character set cannot be converted to another character set by the three methods listed in "What is the Character Set Scanner?", you can still use the Character Set Scanner to test the effect of the conversion that would take place if the data were in the database character set.
The encoding for different character sets can use the same code point for different characters.There is no automatic method to detect what the intended character is. Most European character sets share liberal use of the 8-bit range to encode native characters, so it is very possible for a cell to be reported as convertible but for the wrong reasons.
For example, this can occur when the Character Set Scanner is used with the FROMCHAR parameter set to WE8MSWIN1252. This single-byte character set encodes a character in every available code point so that no matter what data is being scanned, the scanner always identifies a data cell as being available in the source character set.
When you set FROMCHAR, you are assuming that all character data is in that character set but that the Character Set Scanner is not able to accurately determine the validity. Set the FROMCHAR parameter carefully.
The Character Set Scanner does not support the scanning of the VARRAY collection type.
If a database contains data from more than one character set, the Character Set Scanner cannot accurately test the effects of changing the database character set on the database because it cannot differentiate character sets easily. If the data can be divided into two separate tables, one for each character set, then the Character Set Scanner can perform two single table scans to verify the validity of the data.
For each scan, a different value of the FROMCHAR parameter can be used to tell the Character Set Scanner to treat all target columns in the table as if they were in the specified character set.
If a database contains data not in the database character set, but still in only one character set, the Character Set Scanner can perform a full database scan. Use the FROMCHAR parameter to tell the Character Set Scanner what character set the data is in.
The Character Set Scanner provides three modes of database scan:
The Character Set Scanner reads and verifies the character data of all tables belonging to all users in the database including the data dictionary (SYS user), and it reports on the effects of the simulated migration to the new database character set. It scans all schema objects including stored packages, procedures and functions, and object names.
To understand the feasibility of migration to a new database character set, you need to perform a full database scan.
The Character Set Scanner reads and verifies character data of all tables belonging to the specified user and reports on the effects on the tables of changing the character set.
The Character Set Scanner does not test for table definitions such as table names and column names. To see the effects on the schema definitions, you need to perform a full database scan.
The Character Set Scanner reads and verifies the character data of the specified table, and reports the effects on the table of changing the character set.
The Character Set Scanner does not test for table definitions such as table name and column name. To see the effects on the schema definitions, you need to perform a full database scan.
This section describes how to use the Character Set Scanner, including the steps you need to perform before scanning and the procedures on how to invoke the Character Set Scanner. The topics discussed are:
To use the Character Set Scanner, you must run the csminst.sql script on the database that you plan to scan. The csminst.sql script needs to be run only once. The script performs the following tasks to prepare the database for scanning:
CSMIGCSMIGCSMIGCSMIGCSMIGThe SYSTEM tablespace is assigned to CSMIG by default, so you need to ensure there is sufficient storage space available in the SYSTEM tablespace before scanning the database. The amount of space required depends on the type of scan and the nature of the data in the database.
You can modify the default tablespace for CSMIG by editing the csminst.sql script. Modify the following statement in csminst.sql to assign your preferred tablespace to CSMIG as follows:
ALTER USER csmig DEFAULT TABLESPACE tablespace_name;
 
Then run csminst.sql using these commands and SQL statements:
% cd $ORACLE_HOME/rdbms/admin % sqlplus "system/manager as sysdba" SQL> START csminst.sql
The Character Set Scanner is certified with Oracle databases on any platforms running under the same release except that you cannot mix ASCII-based and EBCDIC-based platforms. For example, the Oracle9i release 2 (9.2) versions of the Character Set Scanner on any ASCII-based client platforms are certified to run with any Oracle9i release 2 (9.2) databases on any ASCII-based platforms, while EBCDIC-based clients are certified to run with any Oracle9i database on EBCDIC platforms.
Oracle Corporation recommends that you run the Character Set Scanner in the same Oracle home as the database when possible.
You can invoke the Character Set Scanner by one of these methods:
csscan system/manager PARFILE=filename
 
PARFILE is a file containing the Character Set Scanner parameters you typically use.
csscan system/manager full=y tochar=utf8 array=10240 process=3
csscan system/manager
In an interactive session, the Character Set Scanner prompts you for the following parameters:
The Character Set Scanner provides online help. Enter csscan help=y on the command line to invoke the help screen.
You can let the Character Set Scanner prompt you for parameters by entering the CSSCAN command followed by your username and password. For example:
CSSCAN SYSTEM/MANAGER
Alternatively, you can control how the Character Set Scanner runs by entering the CSSCAN command followed by various parameters. To specify parameters, use keywords. For example:
CSSCAN SYSTEM/MANAGER FULL=y TOCHAR=utf8 ARRAY=102400 PROCESS=3
The following is a list of keywords for the Character Set Scanner:
Keyword Default Prompt Description ---------- ------- ------ ------------------------------------------------- USERID yes username/password FULL N yes scan entire database USER yes user name of the table to scan TABLE yes list of tables to scan EXCLUDE list of tables to exclude from scan TOCHAR yes new database character set name FROMCHAR current database character set name TONCHAR new NCHAR character set name FROMNCHAR current NCHAR character set name ARRAY 10240 yes size of array fetch buffer PROCESS 1 yes number of scan process MAXBLOCKS split table if larger than MAXBLOCKS CAPTURE N capture convertible data SUPPRESS suppress error log by N per table FEEDBACK feedback progress every N rows BOUNDARIES list of column size boundaries for summary report LASTRPT N generate report of the previous database scan LOG scan base name of log files PARFILE parameter file name PRESERVE N preserve existing scan results HELP N show help screen
The parameter file enables you to specify Character Set Scanner parameters in a file where they can be easily modified or reused. Create a parameter file using any flat file text editor. The command line option PARFILE=filename tells the Character Set Scanner to read the parameters from a specified file rather than from the command line. For example:
csscan parfile=filename
 
or
csscanusername/passwordparfile=filename
The syntax for parameter file specifications is one of the following:
KEYWORD=value KEYWORD=(value1, value2, ...)
The following is an example of a parameter file:
USERID=system/manager USER=HR # scan HR's tables TOCHAR=utf8 ARRAY=40960 PROCESS=2 # use two concurrent scan processes FEEDBACK=1000
You can add comments to the parameter file by preceding them with the pound (#) sign. All characters to the right of the pound sign are ignored.
The following topics are included in this section:
The following formula estimates the number of rows fetched at a time:
(rows in array) = (ARRAY buffer size) / (sum of the CHAR and VARCHAR2 column sizes of a given table)
If the sum of the CHAR and VARCHAR2 column sizes exceeds the array buffer size, then the Character Set Scanner fetches only one row at a time. Tables with LONG, CLOB, or NCLOB columns are fetched only one row at a time.
This parameter affects the duration of a database scan. In general, the larger the size of the array buffer, the shorter the duration time. Each scan process will allocate the specified size of array buffer.
For example, if you specify a BOUNDARIES value of (10, 100, 1000), then the application data conversion summary report produces a breakdown of the CHAR data into the following groups by their column length, CHAR(1..10), CHAR(11..100) and CHAR(101..1000). The behavior is the same for the VARCHAR2, NCHAR, and NVARCHAR2 datatypes.
| Default value: | None | 
| Purpose: | Specifies the names of the tables to be excluded from the scan | 
When this parameter is specified, the Character Set Scanner excludes the specified tables from the scan. You can specify the following when you specify the name of the table:
schemaname specifies the name of the user's schema from which to exclude the tabletablename specifies the name of the table or tables to be excludedFor example, the following command scans all of the tables that belong to the hr sample schema except for the employees and departments tables:
cssan system/manager USER=HR EXCLUDE=(HR.EMPLOYEES , HR.DEPARTMENTS) ...
| Default value: | None | 
| Minimum value: | 
 | 
| Maximum value: | 
 | 
| Purpose: | Specifies that the Character Set Scanner should display a progress meter in the fort of a dot for every N number of rows scanned | 
For example, if you specify FEEDBACK=1000, then the Character Set Scanner displays a dot for every 1000 rows scanned. The FEEDBACK value applies to all tables being scanned. It cannot be set for individual tables.
Use this parameter to override the default database character set definition for CHAR, VARCHAR2, LONG, and CLOB data in the database.
Use this parameter to override the default database character set definition for NCHAR, NVARCHAR2, and NCLOB data in the database.
| See Also: "Scan Modes in the Character Set Scanner" for more information about full database scans | 
| Default value: | 
 | 
| Range of values: | 
 | 
| Purpose: | Displays a help message with the descriptions of the Character Set Scanner parameters | 
| Default value: | 
 | 
| Range of values: | 
 | 
| Purpose: | Indicates whether to regenerate the Character Set Scanner reports based on statistics gathered from the previous database scan | 
If LASTRPT=Y is specified, then the Character Set Scanner does not scan the database, but creates the report files using the information left by the previous database scan session instead.
If LASTRPT=Y is specified, then only the USERID, BOUNDARIES, and LOG parameters take effect.
| Default value: | 
 | 
| Purpose: | Specifies a base file name for the following Character Set Scanner report files: | 
By default, the Character Set Scanner generates the three text files, scan.txt, scan.err, and scan.out in the current directory.
For example, if the MAXBLOCKS parameter is set to 1000, then any tables that are greater than 1000 blocks in size will be divided into n chunks, where
n=CEIL(table block size/1000).
Dividing large tables into smaller pieces will be beneficial only when the number of processes set with PROCESS is greater than 1. If the MAXBLOCKS parameter is not set, then the Character Set Scanner attempts to split up large tables based on its own optimization rules.
| Default value: | None | 
| Purpose: | Specifies a filename for a file that contains a list of Character Set Scanner parameters | 
| Default value: | 
 | 
| Range of values: | 
 | 
| Purpose: | Indicates whether to preserve the statistics gathered from the previous scan session | 
If PRESERVE=Y is specified, then the Character Set Scanner preserves all the statistics from the previous scan. It adds (if PRESERVE=Y) or overwrites (if PRESERVE=N) the new statistics for the tables being scanned in the current scan request.
| Default value: | 
 | 
| Minimum value: | 
 | 
| Maximum value: | 
 | 
| Purpose: | Specifies the number of concurrent scan processes to utilize for the database scan | 
| Default value: | Unset (results in unlimited number of rows) | 
| Minimum value: | 
 | 
| Maximum value: | Unlimited | 
| Purpose: | Specifies the maximum number of data exceptions being logged for each table | 
The Character Set Scanner inserts individual exceptional record information into the CSM$ERRORS table when an exception is found in a data cell. The table grows depending on the number of exceptions reported.
This parameter is used to suppress the logging of individual exception information after a specified number of exceptions are inserted for each table. For example, if SUPPRESS is set to 100, then the Character Set Scanner records a maximum of 100 exception records for each table.
| Default value: | None | 
| Purpose: | Specifies the names of the tables to scan | 
You can specify the following when you specify the name of the table:
schemaname specifies the name of the user's schema from which to scan the tabletablename specifies the name of the table or tables to be scannedFor example, the following command scans the employees and departments tables in the hr sample schema:
csscan system/manager TABLE=(HR.EMPLOYEES , HR.DEPARTMENTS) ...
| Default value: | None | 
| Purpose: | Specifies a target database character set name for the  | 
| Default value: | None | 
| Purpose: | Specifies a target database character set name for the  | 
If you do not specify a value for TONCHAR, then the Character Set Scanner does not scan NCHAR, NVARCHAR2, and NCLOB data.
| Default value: | None | 
| Purpose: | Specifies the owner of the tables to be scanned | 
If the parameter USER is specified, then the Character Set Scanner scans all tables belonging to the user. For example, the following statement scans all tables belonging to the user hr:
csscan system/manager USER=hr ...
The following examples are all valid:
username/passwordusername/password@connect_stringusernameusername@connect_string
The following examples show you how to use the command-line and parameter-file methods to use Full Database, User, and Table scan modes.
The following example shows how to scan the full database to see the effects of migrating it to UTF8. This example assumes that the current database character set is WE8ISO8859P1 (or anything other than UTF8).
% csscan system/manager parfile=param.txt
The param.txt file contains the following information:
full=y tochar=utf8 array=40960 process=4
% csscan system/manager full=y tochar=utf8 array=40960 process=4 Scanner Messages Database Scanner: Release 9.2.0.0 - Production (c) Copyright 2001 Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.0 - Production With the Objects option PL/SQL Release 9.2.0.0 - Production Enumerating tables to scan... . process 1 scanning SYSTEM.REPCAT$_RESOLUTION . process 1 scanning SYS.AQ$_MESSAGE_TYPES . process 1 scanning SYS.ARGUMENT$ . process 2 scanning SYS.AUD$ . process 3 scanning SYS.ATTRIBUTE$ . process 4 scanning SYS.ATTRCOL$ . process 2 scanning SYS.AUDIT_ACTIONS . process 2 scanning SYS.BOOTSTRAP$ . process 2 scanning SYS.CCOL$ . process 2 scanning SYS.CDEF$ : : . process 3 scanning SYSTEM.REPCAT$_REPOBJECT . process 1 scanning SYSTEM.REPCAT$_REPPROP . process 2 scanning SYSTEM.REPCAT$_REPSCHEMA . process 3 scanning MDSYS.MD$DIM . process 1 scanning MDSYS.MD$DICTVER . process 2 scanning MDSYS.MD$EXC . process 3 scanning MDSYS.MD$LER . process 1 scanning MDSYS.MD$PTAB . process 2 scanning MDSYS.MD$PTS . process 3 scanning MDSYS.MD$TAB Creating Database Scan Summary Report... Creating Individual Exception Report... Scanner terminated successfully.
The following example shows how to scan the user tables to see the effects of migrating them to UTF8. This example assumes the current database character set is US7ASCII, but the actual data stored is in Western European WE8MSWIN1252 encoding.
% csscan system/manager parfile=param.txt
The param.txt file contains the following information:
user=hr fromchar=we8mswin1252 tochar=utf8 array=40960 process=1
% csscan system/manager user=hr fromchar=we8mswin1252 tochar=utf8 array=40960 process=1
Database Scanner: Release 9.2.0.0 - Production (c) Copyright 2001 Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.0 - Production With the Objects option PL/SQL Release 9.2.0.0 - Production Enumerating tables to scan... . process 1 scanning HR.JOBS . process 1 scanning HR.DEPARTMENTS . process 1 scanning HR.JOB_HISTORY . process 1 scanning HR.EMPLOYEES Creating Database Scan Summary Report... Creating Individual Exception Report... Scanner terminated successfully.
The following example shows how to scan a single table to see the effects on migrating it to WE8MSWIN1252. This example assumes the current database character set is in US7ASCII.
% csscan system/manager parfile=param.txt
The param.txt file contains the following information:
table=employees tochar=we8mswin1252 array=40960 process=1 supress=100
% csscan system/manager table=employees tochar=we8mswin1252 array=40960 process=1 supress=100 Scanner Messages Database Scanner: Release 9.2.0.0 - Production (c) Copyright 2001 Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.0 - Production With the Objects option PL/SQL Release 9.2.0.0 - Production . process 1 scanning HR.EMPLOYEES Creating Database Scan Summary Report... Creating Individual Exception Report... Scanner terminated successfully.
The Character Set Scanner generates two reports for each scan:
A Database Scan Summary Report consists of the following sections:
The information available for each section depends on the type of scan and the parameters you select.
This section describes the parameters selected and the type of scan you chose. The following is an example:
Parameter Value ---------------------------------------- ------------------------------ Scan type Full database Scan CHAR data? YES Current database character set WE8ISO8859P1 New database character set UTF8 Scan NCHAR data? NO Array fetch buffer size 102400 Number of processes 4 ---------------------------------------- ------------------------------
This section describes the current database size. The following is an example:
TABLESPACE Total(MB) Used(MB) Free(MB) ----------------------------- --------------- --------------- --------------- APPS_DATA 1,340.000 1,331.070 8.926 CTX_DATA 30.000 3.145 26.852 INDEX_DATA 140.000 132.559 7.438 RBS_DATA 310.000 300.434 9.563 SYSTEM_DATA 150.000 144.969 5.027 TEMP_DATA 160.000 159.996 TOOLS_DATA 35.000 22.148 12.848 USERS_DATA 220.000 142.195 77.801 ----------------------------- --------------- --------------- --------------- Total 2,385.000 2,073.742 311.227
This report indicates the feasibility of the database character set migration. There are two basic criteria that determine the feasibility of the character set migration of the database. One is the condition of the data dictionary and the other is the condition of the application data.
The Scan Summary section consists of two status lines. The scan mode and the result determines the status that is printed for the data dictionary and application data.
The following is sample output:
All character type data in the data dictionary remains the same in the new character set All character type application data remains the same in the new character set
When all data remains the same in the new character set, it means that the data encoding of the original character set is identical to the target character set. In this case, the character set can be migrated using the ALTER DATABASE CHARACTER SET statement.
If all the data is convertible to the new character set, it means that the data can be safely migrated using the Export and Import utilities. However, the migrated data may or may not have the same encoding as the original character set.
| See Also: 
 | 
This section contains the statistics on the conversion summary of the data dictionary. It reports the statistics by datatype. Table 11-2 describes the types of status that can be reported.
This information is available only when a full database scan is performed. The following is sample output:
Datatype Changeless Convertible Exceptional Total --------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 971,300 1 0 971,301 CHAR 7 0 0 7 LONG 60,325 0 0 60,325 CLOB --------- ---------------- ---------------- ---------------- ---------------- Total 1,031,632 1 0 1,031,633
If the numbers in both the Convertible and Exceptional columns are zero, it means that all the data in the data dictionary will remain the same in the new character set.
If the numbers in the Exceptional column are zero and some numbers in the Convertible columns are not zero, it means that all data in the data dictionary is convertible to the new character set. During import, the relevant data will be converted.
If the numbers in the Exceptional column are not zero, it means that there is data in the data dictionary that is not convertible. Therefore, it is not feasible to migrate the current database to the new character because the export and import process cannot convert the data into the new character set. For example, you might have a table name with invalid characters or a PL/SQL procedure with a comment line that includes data that cannot be mapped to the new character set. These changes to schema objects must be corrected manually before migration to a new character set.
This section contains the statistics on conversion summary of the application data. The statistics are reported by datatype. Table 11-3 describes the types of status that can be reported.
The following is sample output:
Datatype Changeless Convertible Exceptional Total ---------- ---------------- ---------------- ---------------- ---------------- VARCHAR2 23,213,745 1,324 0 23,215,069 CHAR 423,430 0 0 423,430 LONG 8,624 33 0 8,657 CLOB 58,839 11,114 28 69,981 ---------- ---------------- ---------------- ---------------- ---------------- Total 23,704,638 12,471 28 23,717,137
This section contains the conversion summary of the CHAR and VARCHAR2 application data. The statistics are reported by column size boundaries specified by the BOUNDARIES parameter. Table 11-4 describes the types of status available.
This information is available only when the BOUNDARIES parameter is specified.
The following is sample output:
Datatype Changeless Convertible Exceptional Total ------------------- ------------- ------------- --------------- ---------------- VARCHAR2(1..10) 1,474,825 0 0 1,474,825 VARCHAR2(11..100) 9,691,520 71 0 9,691,591 VARCHAR2(101..4000) 12,047,400 1,253 0 12,048,653 ------------------- ------------- ------------- --------------- ---------------- CHAR(1..10) 423,413 0 0 423,413 CHAR(11..100) 17 0 0 17 CHAR(101..4000) ------------------- ------------- ------------- --------------- ---------------- Total 23,637,175 1,324 0 23,638,499
This report shows how Convertible and Exceptional data is distributed within the database. The statistics are reported by table. If the list contains only a few rows, it means the Convertible data is localized. If the list contains many rows, it means the Convertible data occurs throughout the database.
The following is sample output:
USER.TABLE Convertible Exceptional ------------------------------------------------ ---------------- ---------------- SMG.SOURCE 1 0 SMG.HELP 12 0 SMG.CLOSE_LIST 16 0 SMG.ATTENDEES 8 0 SGT.DR_010_I1T1 7 0 SGT.DR_011_I1T1 7 0 SGT.MRK_SRV_PROFILE 2 0 SGT.MRK_SRV_PROFILE_TEMP 2 0 SGT.MRK_SRV_QUESTION 3 0 ------------------------------------------------ ---------------- ----------------
This report shows how Convertible and Exceptional data is distributed within the database. The statistics are reported by column. The following is an example:
USER.TABLE|COLUMN Convertible Exceptional ----------------------------------------------- ---------------- ---------------- SMG.SOURCE|SOURCE 1 0 SMG.HELP|INFO 12 0 SMG.CLOSE_LIST|FNAME 1 0 SMG.CLOSE_LIST|LNAME 1 0 SMG.CLOSE_LIST|COMPANY 1 0 SMG.CLOSE_LIST|STREET 8 0 SMG.CLOSE_LIST|CITY 4 0 SMG.CLOSE_LIST|STATE 1 0 SMG.ATTENDEES|ATTENDEE_NAME 1 0 SMG.ATTENDEES|ADDRESS1 3 0 SMG.ATTENDEES|ADDRESS2 2 0 SMG.ATTENDEES|ADDRESS3 2 0 SGT.DR_010_I1T1|WORD_TEXT 7 0 SGT.DR_011_I1T1|WORD_TEXT 7 0 SGT.MRK_SRV_PROFILE|FNAME 1 0 SGT.MRK_SRV_PROFILE|LNAME 1 0 SGT.MRK_SRV_PROFILE_TEMP|FNAME 1 0 SGT.MRK_SRV_PROFILE_TEMP|LNAME 1 0 SGT.MRK_SRV_QUESTION|ANSWER 3 0
This generates a list of all the indexes that are affected by the database character set migration. These can be rebuilt after the data has been imported. The following is an example:
USER.INDEX on USER.TABLE(COLUMN) ---------------------------------------------------------------------------- CD2000.COMPANY_IX_PID_BID_NNAME on CD2000.COMPANY(CO_NLS_NAME) CD2000.I_MASHINE_MAINT_CONT on CD2000.MACHINE(MA_MAINT_CONT#) CD2000.PERSON_NEWS_SABUN_CONT_CONT on CD2000.PERSON_NEWS_SABUN_CONT(CONT_BID) CD2000.PENEWSABUN3_PEID_CONT on CD2000.PE_NEWS_SABUN_3(CONT_BID) PMS2000.CALLS_IX_STATUS_SUPPMGR on PMS2000.CALLS(SUPPMGR) PMS2000.MAILQUEUE_CHK_SUB_TOM on PMS2000.MAIL_QUEUE(TO_MAIL) PMS2000.MAILQUEUE_CHK_SUB_TOM on PMS2000.MAIL_QUEUE(SUBJECT) PMS2000.TMP_IX_COMP on PMS2000.TMP_CHK_COMP(COMP_NAME) ----------------------------------------------------------------------------
An Individual Exception Report consists of the following summaries:
This section describes the parameters and the type of scan chosen. The following is an example:
Parameter Value ---------------------------------------- ------------------------------ Scan type Full database Scan CHAR data? YES Current database character set we8mswin1252 New database character set utf8 Scan NCHAR data? NO Array fetch buffer size 102400 Number of rows to heap up for insert 10 Number of processes 1 ---------------------------------------- ------------------------------
This section identifies the data dictionary data that is either convertible or has exceptions. There are two types of exceptions:
The following is an example of output about a data dictionary that contains convertible data:
User : SYS Table : METASTYLESHEET Column: STYLESHEET Type : CLOB Number of Exceptions : 0 Max Post Conversion Data Size: 0 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAAHMAABAAAAs+AAA convertible AAAAHMAABAAAAs+AAB convertible ------------------ ------------------ ----- ------------------------------
| See Also: "Application Data Individual Exceptions" for more information about exceptions | 
This report identifies the data that has exceptions so that this data can then be modified if necessary.
There are two types of exceptions:
exceed column size 
The column size should be extended if the maximum column width has been surpassed. If not, data truncation occurs.
lossy conversion 
The data must be corrected before migrating to the new character set, or else the invalid characters will be converted to a replacement character. Replacement characters are usually specified as ? or ¿ or as a character that is linguistically similar.
The following is an example of an individual exception report that illustrates some possible problems when changing the database character set from WE8ISO8859P1 to UTF8:
User: HR Table: EMPLOYEES Column: LAST_NAME Type: VARCHAR2(10) Number of Exceptions: 2 Max Post Conversion Data Size: 11 ROWID Exception Type Size Cell Data(first 30 bytes) ------------------ ------------------ ----- ------------------------------ AAAA2fAAFAABJwQAAg exceed column size 11 Ährenfeldt AAAA2fAAFAABJwQAAu lossy conversion órâclë8TM AAAA2fAAFAABJwQAAu exceed column size 11 órâclë8TM ------------------ ------------------ ----- ------------------------------
The values Ährenfeldt and órâclë8TM exceed the column size (10 bytes) because each of the characters Ä, ó, â, and ë occupies one byte in WE8ISO8859P1 but two bytes in UTF8. The value órâclë8TM has lossy conversion to UTF8 because the trademark sign TM (code 153) is not a valid WE8ISO8859P1 character. It is a WE8MSWIN1252 character, which is a superset of WE8ISO8859P1.
You can view the data that has an exception by issuing a SELECT statement:
SELECT last_name FROM hr.employees WHERE ROWID='AAAA2fAAFAABJwQAAu';
You can modify the data that has the exception by issuing an UPDATE statement:
UPDATE hr.employees SET last_name = 'Oracle8 TM' WHERE ROWID='AAAA2fAAFAABJwQAAu';
This section describes storage and performance issues in the Character Set Scanner. It contains the following topics:
This section describes the size and the growth of the Character Set Scanner's system tables, and explains the approach to maintain them. There are three system tables that can increase rapidly depending on the nature of the data stored in the database.
You may want to assign a large tablespace to the user CSMIG by amending the csminst.sql script. By default, the SYSTEM tablespace is assigned to the user CSMIG.
This section includes the following topics:
The Character Set Scanner enumerates all tables that need to be scanned into the CSM$TABLES table.
You can look up the number of tables (to get an estimate of how large CSM$TABLES can become) in the database by issuing the following SQL statement:
SELECT COUNT(*) FROM DBA_TABLES;
The Character Set Scanner stores statistical information for each column scanned into the CSM$COLUMNS table.
You can look up the number of character type columns (to get an estimate of how large CSM$COLUMNS can become) in the database by issuing the following SQL statement:
SELECT COUNT(*) FROM DBA_TAB_COLUMNS WHERE DATA_TYPE IN ('CHAR', 'VARCHAR2', 'LONG', 'CLOB');
When exceptions are detected with cell data, the Character Set Scanner inserts individual exception information into the CSM$ERRORS table. This information then appears in the Individual Exception Report and facilitates identifying records to be modified if necessary.
If your database contains a lot of data that is signaled as Exceptional or Convertible (when the parameter CAPTURE=Y is set), then the CSM$ERRORS table can grow very large. You can prevent the CSM$ERRORS table from growing unnecessarily large by using the SUPPRESS parameter.
The SUPPRESS parameter applies to all tables. The Character Set Scanner suppresses inserting individual Exceptional information after the specified number of exceptions is inserted. Limiting the number of exceptions to be recorded may not be useful if the exceptions are spread over different tables.
This section describes ways to increase performance when scanning the database.
If you plan to scan a relatively large database, for example, over 50GB, you might want to consider using multiple scan processes. This shortens the duration of database scans by using hardware resources such as CPU and memory available on the machine. A guideline for determining the number of scan processes to use is to set the number equal to the CPU_COUNT initialization parameter.
The Character Set Scanner fetches multiple rows at a time when an array fetch is allowed. Generally, you will improve performance by letting the Character Set Scanner use a bigger array fetch buffer. Each process allocates its own fetch buffer.
The Character Set Scanner inserts individual Exceptional and Convertible (when CAPTURE=Y) information into the CSM$ERRORS table. In general, insertion into the CSM$ERRORS table is more costly than data fetching. If your database has a lot of data that is signaled as Exceptional or Convertible, then the Character Set Scanner issues many insert statements, causing performance degradation. Oracle Corporation recommends setting a limit on the number of exception rows to be recorded using the SUPRESS parameter.
This section contains the following reference material:
The Character Set Scanner uses the following views.
This view contains statistical information about columns that were scanned.
This view contains statistical information about columns that were scanned.
This view contains individual exception information for cell data and object definitions.
This view contains individual exception information for indexes.
This view contains information about database tables to be scanned. The Character Set Scanner enumerates all tables to be scanned into this view.
The Character Set Scanner has the following error messages:
CSS-00100 failed to allocate memory size of number An attempt was made to allocate memory with size 0 or bigger than the maximum size. This is an internal error. Contact Oracle Customer Support. CSS-00101 failed to release memory An attempt was made to release memory with invalid pointer. This is an internal error. Contact Oracle Customer Support. CSS-00102 failed to release memory, null pointer given An attempt was made to release memory with null pointer. This is an internal error. Contact Oracle Customer Support. CSS-00105 failed to parse BOUNDARIES parameterBOUNDARIESparameter was specified in an invalid format. Refer to the manual for the correct syntax. CSS-00106 failed to parse SPLIT parameterSPLITparameter was specified in an invalid format. Refer to the manual for the correct syntax. CSS-00107 Character set migration utility schem not installedCSM$VERSIONtable not found in the database. RunCSMINST.SQLon the database. CSS-00108 Character set migration utility schema not compatible IncompatibleCSM$*tables found in the database. RunCSMINST.SQLon the database. CSS-00110 failed to parse useridUSERIDparameter was specified in an invalid format. Refer to the manual for the correct syntax. CSS-00111 failed to get RDBMS version Failed to retrieve the value of the Version of the database. This is an internal error. Contact Oracle Customer Support. CSS-00112 database version not supported The database version is older than release 8.0.5.0.0. Upgrade the database to release 8.0.5.0.0 or later, then try again. CSS-00113 user %s is not allowed to access data dictionary The specified user cannot access the data dictionary. SetO7_DICTIONARY_ACCESSIBILITYparameter toTRUE, or use SYS user. CSS-00114 failed to get database character set name Failed to retrieve value ofNLS_CHARACTERSETorNLS_NCHAR_CHARACTERSETparameter fromNLS_ DATABASE_PARAMETERSview. This is an internal error. Contact Oracle Customer Support. CSS-00115 invalid character set name %s The specified character set is not a valid Oracle character set.
| See Also: Appendix A, "Locale Data" for the correct character set name | 
CSS-00116 failed to reset NLS_LANG/NLS_NCHAR parameter Failed to forceNLS_LANGcharacter set to be same as database character set. This is an internal error. Contact Oracle Customer Support. CSS-00117 failed to clear previous scan log Failed to delete all rows fromCSM$*tables. This is an internal error. Contact Oracle Customer Support. CSS-00118 failed to save command parameters Failed to insert rows intoCSM$PARAMETERStable. This is an internal error. Contact Oracle Customer Support. CSS-00119 failed to save scan start time Failed to insert a row intoCSM$PARAMETERStable. This is an internal error. Contact Oracle Customer Support. CSS-00120 failed to enumerate tables to scan Failed to enumerate tables to scan intoCSM$TABLEStable. This is an internal error. Contact Oracle Customer Support. CSS-00121 failed to save scan complete time Failed to insert a row intoCSM$PARAMETERStable. This is an internal error. Contact Oracle Customer Support. CSS-00122 failed to create scan report Failed to create database scan report. This is an internal error. Contact Oracle Customer Support. CSS-00123 failed to check if user %s exist Select statement that checks if the specified user exists in the database failed. This is an internal error. Contact Oracle Customer Support. CSS-00124 user %s not found The specified user does not exist in the database. Check the user name. CSS-00125 failed to check if table %s.%s exist Select statement that checks if the specified table exists in the database failed. This is an internal error. Contact Oracle Customer Support. CSS-00126 table %s.%s not found The specified table does not exist in the database. Check the user name and table name. CSS-00127 user %s does not have DBA privilege The specified user does not have DBA privileges, which are required to scan the database. Choose a user with DBA privileges. CSS-00128 failed to get server version string Failed to retrieve the version string of the database. None. CSS-00130 failed to initialize semaphore Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00131 failed to spawn scan process %d Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00132 failed to destroy semaphore Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00133 failed to wait semaphore Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00134 failed to post semaphore Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00140 failed to scan table (tid=%d, oid=%d) Data scan on this particular table failed. This is an internal error. Contact Oracle Customer Support. CSS-00141 failed to save table scan start time Failed to update a row in theCSM$TABLEStable. This is an internal error. Contact Oracle Customer Support. CSS-00142 failed to get table information Failed to retrieve various information from user id and object id of the table. This is an internal error. Contact Oracle Customer Support. CSS-00143 failed to get column attributes Failed to retrieve column attributes of the table. This is an internal error. Contact Oracle Customer Support. CSS-00144 failed to scan table %s.%s Data scan on this particular table was not successful. This is an internal error. Contact Oracle Customer Support. CSS-00145 failed to save scan result for columns Failed to insert rows intoCSM$COLUMNStable. This is an internal error. Contact Oracle Customer Support. CSS-00146 failed to save scan result for table Failed to update a row ofCSM$TABLEStable. This is an internal error. Contact Oracle Customer Support. CSS-00147 unexpected data truncation Scanner allocates the exactly same size of memory as the column byte size for fetch buffer, resulting in unexpected data truncation. This is an internal error. Contact Oracle Customer Support. CSS-00150 failed to enumerate table Failed to retrieve the specified table information. This is an internal error. Contact Oracle Customer Support. CSS-00151 failed to enumerate user tables Failed to enumerate all tables that belong to the specified user. This is an internal error. Contact Oracle Customer Support. CSS-00152 failed to enumerate all tables Failed to enumerate all tables in the database. This is an internal error. Contact Oracle Customer Support. CSS-00153 failed to enumerate character type columns Failed to enumerate allCHAR,VARCHAR2,LONG, andCLOBcolumns of tables to scan. This is an internal error. Contact Oracle Customer Support. CSS-00154 failed to create list of tables to scan Failed to enumerate the tables intoCSM$TABLEStable. This is an internal error. Contact Oracle Customer Support. CSS-00155 failed to split tables for scan Failed to split the specified tables. This is an internal error. Contact Oracle Customer Support. CSS-00156 failed to get total number of tables to scan Select statement that retrieves the number of tables to scan failed. This is an internal error. Contact Oracle Customer Support. CSS-00157 failed to retrieve list of tables to scan Failed to read all table ids into the scanner memory. This is an internal error. Contact Oracle Customer Support. CSS-00158 failed to retrieve index defined on column Select statement that retrieves index defined on the column fails. This is an internal error. Contact Oracle Customer Support. CSS-00160 failed to open summary report file File open function returned error. Check if you have create/write privilege on the disk and check if the file name specified for theLOGparameter is valid. CSS-00161 failed to report scan elapsed time Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00162 failed to report database size information Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00163 failed to report scan parameters Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00164 failed to report Scan summary Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00165 failed to report conversion summary Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00166 failed to report convertible data distribution Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00167 failed to open exception report file File open function returned error. Check if you have create/write privilege on the disk and check if the file name specified forLOGparameter is valid. CSS-00168 failed to report individual exceptions Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00170 failed to retrieve size of tablespace % Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00171 failed to retrieve free size of tablespace %s Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00172 failed to retrieve total size of tablespace %s Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00173 failed to retrieve used size of the database Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00174 failed to retrieve free size of the database Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00175 failed to retrieve total size of the database Unknown. This is an internal error. Contact Oracle Customer Support. CSS-00176 failed to enumerate user tables in bitmapped tablespace Failed to enumerate tables in bitmapped tablespace. This is an internal error. Contact Oracle Customer Support.
| 
 |  Copyright © 1996, 2002 Oracle Corporation. All Rights Reserved. | 
 |