17 Troubleshooting
You can enable tracing, locate log and trace files, and troubleshooting common issues.
Troubleshooting Tips
Use these tips to discover information about the sharded database that you need to help you troubleshoot issues.
Pre-Deployment Network Validation
Several GDSCTL commands have a -validate_network
option to
detect network configuration issues as early as possible during the specification and
deployment of sharded databases.
The -validate_network
can be used in following GDSCTL commands for
sharded databases:
-
add {invitednode | invitedsubnet}
-
add shard
-
deploy
-
start gsm
-
validate
(also includes-show_errors
)
Checking the Sharding Method
Run gdsctl config sdb
to check which sharding method, also
known as the shard type, is used in the sharded database.
The sharding method can be system-managed, composite, or user-defined.
The sharding method is shown under "Shard type" in the output of gdsctl
config sdb
as shown here.
gdsctl> config sdb
GDS Pool administrators
------------------------
Replication Type
------------------------
Data Guard
Shard type
------------------------
System-managed
Shard spaces
------------------------
shd1
Services
------------------------
srv1
Checking the Replication Type
Run gdsctl config sdb
to check which method is used for
shard replication in the sharded database.
The replication type is shown under "Replication Type" in the output of
gdsctl config sdb
as shown here.
gdsctl> config sdb
GDS Pool administrators
------------------------
Replication Type
------------------------
Data Guard
Shard type
------------------------
System-managed
Shard spaces
------------------------
shd1
Services
------------------------
srv1
Table 17-1 Replication types in config sdb output
Replication Type | Value Shown in Output |
---|---|
Oracle Data Guard | Data Guard |
Raft |
Checking the Oracle Data Guard Protection Mode
You can run gdsctl config shardspace
on a given shardspace
to check the Oracle Data Guard protection mode in your GDSCTL session, rather than
switching to DGMGRL.
Data Guard can be configured in three different protection modes: MaxProtection, MaxAvailability, and MaxPerformance.
The Data Guard protection mode is shown under PROTECTION MODE in the
gdsctl config shardspace
command output, as
shown here.
GDSCTL> config shardspace -shardspace shd1
Shard Group Region Role
----------- ------ ----
dbs1 east Primary
PROTECTION_MODE Chunks
--------------- ------
MaxProtection 6
Checking Which Shards Are Mapped to a Key
You can run gdsctl config
chunks -key
to check which shards are mapped to a sharding
key.
Example 1: Single Table Family
In the following example, there is only one table family in the sharded database configuration, and the table is partitioned (sharded) on data type number.
In this example, the user is checking which chunk sharding key value "2" is mapped to. In the output it shows sharding key 2 is mapped to chunk "3" and is present in the database "aime1b".
GDSCTL> config chunks -key 2
Range Definition
------------------------
Chunks Range Definition
------ ----------------
3 1431655764-2147483646
Databases
------------------------
aime1b
Similarly, this can be done for any data type sharding is done on. Also, a multiple column sharding key can be checked with comma separated values.
The range definition is the range of hash values and can be ignored.
Example 2: Multiple Table Families
In a multiple table family configuration, add the option
-table_family
to specify the table family
to which the specified sharding key belongs.
The config chunks
command lists shards from all
shardgroups in the topology. This example also lists a Data Guard
standby shardgroup, as shown by the addition of "aime1e" to the
Databases (shards) list.
GDSCTL> config chunks -key 1 -table_family testuserfam3.customersfam1
Range Definition
------------------------
Chunks Range Definition
------ ----------------
1 0-357913941
Databases
------------------------
aime1b
aime1e
Example 3: Specifying a Multiple Column Sharding Key
When a table is sharded by multiple columns, specify the sharding key value as a comma-separated list as shown here.
GDSCTL> config chunks -key 10,mary,2010-04-04
Range Definition
------------------------
Chunks Range Definition
------ ----------------
4 1288490187-1717986916
Databases
------------------------
aime1b
aime1e
Checking Shard Operation Mode (Read-Only or Read-Write)
You can check whether shards are running in read-only or read-write mode by
running gdsctl config chunks -cross_shard
.
The gdsctl config chunks -cross_shard
command output
shows which shards, listed under "Database", are running in
read-only and read-write modes, as shown below. The command also
lists the chunk ranges on those shards.
gdsctl config chunks -cross_shard
Read-Only cross shard targets
------------------------
Database From To
-------- ---- --
tst3b_cdb2_pdb1 1 3
tst3c_cdb3_pdb1 9 10
tst3d_cdb2_pdb1 4 5
tst3e_cdb3_pdb1 6 8
Chunks not offered for cross-shard
------------------------
Shard space From To
----------- ---- --
Read-Write cross-shard targets
------------------------
Database From To
-------- ---- --
tst3b_cdb2_pdb1 1 5
tst3c_cdb3_pdb1 6 10
Chunks not offered for Read-Write cross-shard activity
------------------------
Data N/A
Checking DDL Text
Run gdsctl show ddl -ddl ddl_id
to get the
text for the specified DDL.
The DDL numeric identifier is specified with -ddl
ddl_id
to get the text
and other details of a particular DDL, as shown here.
gdsctl show ddl -ddl 5
DDL Text: CREATE SHARDED TABLE Customers ( CustNo NUMBER NOT NULL, Name VARCHAR2(50), Address VARCHAR2(250), Location VARCHAR2(20), Class VARCHAR2(3), CONSTRAINT RootPK PRIMARY KEY(CustNo)) PARTITION BY CONSISTENT HASH (CustNo) PARTITIONS AUTO TABLESPACE SET ts1
Owner: TESTUSER1
Object name: CUSTOMERS
DDL type: C
Obsolete: 0
Failed shards:
Note:
Theshow
ddl
command output might be truncated. You can run
SELECT ddl_text FROM
gsmadmin_internal.ddl_requests
on the shard catalog
to see the full text of the statements.
Checking Chunk Migration Status
Run gdsctl config chunks -show_reshard
to check the
status of chunk migration.
A chunk move is a long running operation, whether user-initiated or
internal (during incremental deploy), so if you need to check the
status, the gdsctl config chunks -show_reshard
provides the following status indicators as the move progresses.
-
empty - indicates no chunk migration in progress
-
scheduled - chunk is pending movement, which could be because it is waiting on another chunk move to complete, or the move didn't initiate due to some error
-
running - current in progress
-
failed - chunk move failed. Check GSM traces and source and target database traces for details.
In the following example, chunk move status is shown in the "Ongoing chunk movement" table in the command output.
gdsctl config chunks -show_reshard
Chunks
------------------------
Database From To
-------- ---- --
tst3b_cdb2_pdb1 1 6
tst3c_cdb3_pdb1 7 10
tst3d_cdb2_pdb1 1 6
tst3e_cdb3_pdb1 7 10
Ongoing chunk movement
------------------------
Chunk Source Target status
----- ------ ------ ------
7 tst3c_cdb3_pdb1 tst3b_cdb2_pdb1 Running
8 tst3c_cdb3_pdb1 tst3b_cdb2_pdb1 scheduled
9 tst3c_cdb3_pdb1 tst3b_cdb2_pdb1 scheduled
10 tst3c_cdb3_pdb1 tst3b_cdb2_pdb1 scheduled
Checking Table Type (Sharded or Duplicated)
You can check whether tables are sharded or duplicated in
dba/all/user_tables using SELECT TABLE_NAME,SHARDED,DUPLICATED FROM
user_tables;
.
In the following example, column "S" indicates whether a table is sharded, and column "D" indicates whether a table is duplicated.
SQL> select TABLE_NAME,SHARDED,DUPLICATED from user_tables;
TABLE_NAME S D
--------------- - -
CUSTOMERS Y N
DUP1 N Y
LINEITEMS Y N
MLOG$_DUP1 N N
ORDERS Y N
Checking User Type (Local or ALL_SHARD)
You can find out which users are created as local users and which are sharded database users by selecting the username and ALL_SHARD column in dba/all/user_users.
SQL> select USERNAME,ALL_SHARD from users_users where username='TESTUSER1';
USERNAME ALL_SHARD
--------------- ---------
TESTUSER1 YES
Identifying Tables Created as Sharded Tablespaces
You can find out whether tablespaces are used for a sharded table by selecting the TABLESPACE_NAME and CHUNK_TABLESPACE columns in dba/all/user_tablespaces.
The value in the CHUNK_TABLESPACE column is Y in dba/all/user_tablespaces if it is a tablespace for a sharded table.
SQL> select TABLESPACE_NAME,CHUNK_TABLESPACE from user_tablespaces;
TABLESPACE_NAME C
------------------------------ -
SYSTEM N
SYSAUX N
TEMP N
SYSEXT N
TS1 Y
Checking if Shard DDL is Enabled or Disabled
You can check if Shard DDL is enabled or disabled in the current SQL session.
These examples show the result of checking Shard DDL status after enabling and disabling Shard DDL.
SQL> alter session enable shard ddl;
Session altered.
SQL> select shard_ddl_status from v$session where AUDSID = userenv('SESSIONID');
SHARD_DD
--------
ENABLED
SQL> alter session disable shard ddl;
Session altered.
SQL> select shard_ddl_status from v$session where AUDSID = userenv('SESSIONID');
SHARD_DD
--------
DISABLED
Filtering Data by Sharding Key
You can set the SHARD_QUERIES_RESTRICTED_BY_KEY
parameter
to enable or disable data filtering by a specified sharding key.
The parameter SHARD_QUERIES_RESTRICTED_BY_KEY
can be set
with ALTER
at the system or session level. If
enabled, DMLs will only display select data for specified
SHARDING_KEY
set in the client
connection.
In the following example, the client connection is established with a
shard with SHARDING_KEY
specified as "1". However,
when the client runs a SELECT
on the customers
table, all of the rows in that table in the shard are displayed.
connection established for client with sharding_key=1
SQL> select * from customers order by custno;
CUSTNO NAME ADDRESS LOCATION CLA
---------- ---------- ---------- ---------- ---
1 John Oracle KM Bangalore A
50 Larry Oracle HQ SFO B
2 rows selected.
SQL>
Now, as shown below, we enable session level filtering, and the result of
the same SELECT
statement is restricted to only the
single row that matches the SHARD_KEY
specified in
the client connection.
SQL> alter session set shard_queries_restricted_by_key = true;
Session altered.
SQL> select current_shard_key from dual;
CURRENT_SHARD_KEY
-----------------
1
1 row selected.
SQL> select * from customers;
CUSTNO NAME ADDRESS LOCATION CLA
---------- ---------- ---------- ---------- ---
1 John Oracle KM Bangalore A
Gathering Optimizer Statistics on Sharded Tables
You can gather statistics on sharded tables from the coordinator database.
The statistic preference parameter COORDINATOR_TRIGGER_SHARD
, when
set to TRUE
on all of the shards, allows the coordinator database
to import the statistics gathered on the shards.
The PL/SQL procedures DBMS_STATS.GATHER_SCHEMA_STATS()
and DBMS_STATS.GATHER_TABLE_STATS()
gather statistics on sharded
tables and duplicated tables in the shards and in the coordinator database. See
also, REPORT_GATHER_TABLE_STATS Function.
Manual Statistics Gathering
-
Set
COORDINATOR_TRIGGER_SHARD
toTRUE
on all of the shards.This step is performed only one time and only on the shards. If, for example, you have a schema named
sharduser
:connect / as sysdba EXECUTE DBMS_STATS.SET_SCHEMA_PREFS('SHARDUSER','COORDINATOR_TRIGGER_SHARD','TRUE');
-
Gather statistics across the shards.
The user should be an all-shards user and needs to have privileges to access dictionary tables.
- On the shards run the
following.
connect sharduser/password EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'SHARDUSER', options => 'GATHER');
- When all shards are completed, to pull aggregated
statistics run the following on the
coordinator.
connect sharduser/password EXEC DBMS_STATS.GATHER_SCHEMA_STATS(ownname => 'SHARDUSER', options => 'GATHER');
- Check the statistics on all of the
shards.
connect sharduser/password ALTER SESSION SET nls_date_format='DD-MON-YYYY HH24:MI:SS'; col TABLE_NAME form a40 set pagesize 200 linesize 200 SELECT TABLE_NAME, NUM_ROWS, sharded, duplicated, last_analyzed FROM user_tables WHERE table_name not like 'MLOG%' and table_name not like 'RUPD%' and table_name not like 'USLOG%';
- On the shards run the
following.
Automatic Statistics Gathering
-
Set
COORDINATOR_TRIGGER_SHARD
toTRUE
on all of the shards.This step is performed only one time and only on the shards. If, for example, you have a schema named
sharduser
:connect / as sysdba EXECUTE DBMS_STATS.SET_SCHEMA_PREFS('SHARDUSER','COORDINATOR_TRIGGER_SHARD','TRUE');
-
Schedule a job to pull aggregated statistics on the shards and on the coordinator database.
The user should be an all-shards user and must have privileges to access dictionary tables.
Start the following job on the shards:
connect sharduser/password BEGIN DBMS_SCHEDULER.CREATE_JOB ( job_name => 'Gather_Stats_2', job_type => 'PLSQL_BLOCK', job_action => 'BEGIN DBMS_STATS.GATHER_SCHEMA_STATS(ownname => ''DEMO'', options => ''GATHER''); END;', start_date => SYSDATE, repeat_interval => 'freq=daily;byday=MON,TUE,WED,THU,FRI,SAT,SUN;byhour=14;byminute=10;bysecond=00', end_date => NULL, enabled => TRUE, comments => 'Gather table statistics'); END; /
After the job on all of the shards is finished, start the following job on the coordinator.
connect sharduser/password BEGIN DBMS_SCHEDULER.CREATE_JOB ( job_name => 'Gather_Stats_2', job_type => 'PLSQL_BLOCK', job_action => 'BEGIN DBMS_STATS.GATHER_SCHEMA_STATS(ownname => ''DEMO'', options => ''GATHER''); END;', start_date => SYSDATE, repeat_interval => 'freq=daily;byday=MON,TUE,WED,THU,FRI,SAT,SUN;byhour=15;byminute=10;bysecond=00', end_date => NULL, enabled => TRUE, comments => 'Gather table statistics'); END; /
Generate HTML SQL Monitor Output for a Query Running from the Shard Catalog
To generate an HTML SQL monitor output, you can follow these steps:
-
Add hint to query:
SELECT /*+ MONITOR */ ...
For example, a cross-shard query from the shard catalog:
select /*+ MONITOR */ count(*) from CUSTOMER;
-
Get
SQL_ID
of the query fromv$sql
.SELECT SQL_ID, SQL_FULLTEXT FROM V$SQL WHERE UPPER(SQL_FULLTEXT) LIKE '%CUSTOMER%' AND LAST_ACTIVE_TIME > sysdate -1 ORDER BY LAST_ACTIVE_TIME DESC;
-
Generate a report in a file ( for example,
report.html
in either the default or a specific folder with the same or different name).SET LONG 1000000 SET LONGCHUNKSIZE 1000000 SET LINESIZE 1000 SET PAGESIZE 0 SET TRIM ON SET TRIMSPOOL ON SET ECHO OFFSET FEEDBACK OFF spool report.html -- replace sql_id values with sql_id of the query SELECT DBMS_SQLTUNE.report_sql_monitor( sql_id => 'dfj5upfq6w50j', type => 'ACTIVE', report_level => 'ALL') AS report FROM dual; spool off;
-
Find the generated SQL Monitor report and view it in a browser or any HTML viewer tool.
Tracing and Debug Information
You can enable tracing for your sharded database and find information in any of several trace files. GDSCTL also has several commands that can display status and error information.
Enabling Tracing
Enable PL/SQL tracing to track down issues in the sharded database.
To get full tracing, set the GWM_TRACE
level as shown here. The following statement provides immediate tracing, but the trace is disabled after a database restart.
ALTER SYSTEM SET EVENTS 'immediate trace name GWM_TRACE level 263';
To disable the GWM_TRACE
, issue:
ALTER SYSTEM SET EVENTS 'immediate trace name GWM_TRACE level 200';
The following statement enables tracing that continues in perpetuity, but only after restarting the database.
ALTER SYSTEM SET EVENT='10798 trace name context forever, level 7' SCOPE=spfile;
It is recommended that you set both of the above traces to be thorough.
To trace everything in the sharded database environment, you must enable tracing on the shard catalog and all of the shards. The traces are written to the RDBMS session trace file for either the GDSCTL session on the shard catalog, or the session(s) created by the shard director (also called GSM) on the individual shards.
Where to Find Alert Logs and Trace Files
There are several places to look for trace and alert logs in the sharded database environment.
Standard RDBMS trace files located in diag/rdbms/.. will contain trace output.
Output from ‘deploy’ will go to job queue trace files db_unique_name_jXXX_PID.trc.
Output from other GDSCTL commands will go to either a shared server trace file db_unique_name_sXXX_PID.trc or dedicated trace file db_unique_name_ora_PID.trc depending on connect strings used.
Shared servers are typically used for many of the connections to the catalog and shards, so the tracing is in a shared server trace file named SID_s00*.trc.
GDSCTL has several commands that can display status and error information.
Use GDSCTL STATUS GSM
to view locations for shard director (GSM) trace and log files.
GDSCTL> status
Alias SHARDDIRECTOR1
Version 18.0.0.0.0
Start Date 25-FEB-2016 07:27:39
Trace Level support
Listener Log File /u01/app/oracle/diag/gsm/slc05abw/sharddirector1/alert/log.xml
Listener Trace File /u01/app/oracle/diag/gsm/slc05abw/sharddirector1/trace/
ora_10516_139939557888352.trc
Endpoint summary (ADDRESS=(HOST=shard0)(PORT=1571)(PROTOCOL=tcp))
GSMOCI Version 2.2.1
Mastership N
Connected to GDS catalog Y
Process Id 10535
Number of reconnections 0
Pending tasks. Total 0
Tasks in process. Total 0
Regional Mastership TRUE
Total messages published 71702
Time Zone +00:00
Orphaned Buddy Regions: None
GDS region region1
Network metrics:
Region: region2 Network factor:0
The non-XML version of the alert.log file can be found in the /trace directory as shown here.
/u01/app/oracle/diag/gsm/shard-director-node/sharddirector1/trace/alert*.log
To decrypt log output in GSM use the following command.
GDSCTL> set _event 17 -config_only
Primary shard director (GSM) trace/alert files include status and errors on any and all asynchronous commands or background tasks (move chunk, split chunk, deploy, shard registration, Data Guard configuration, shard DDL processing, etc.)
To find pending AQ requests for the shard director, including error status, use GDSCTL CONFIG.
To see ongoing and scheduled chunk movement, use GDSCTL CONFIG CHUNKS -show_reshard
To see shards with failed DDLs, use GDSCTL SHOW DDL -failed_only
To see the DDL error information for a given shard, use GDSCTL CONFIG SHARD -shard shard_name
Common Error Patterns and Resolutions
Troubleshoot common errors in Oracle Globally Distributed Database.
Shard Director Fails to Start
If you encounter issues starting the shard director, try the following:
To start Scheduler you must be inside ORACLE_HOME on each shard server.
GDSCTL>start gsm -gsm shardDGdirector
GSM-45054: GSM error
GSM-40070: GSM is not able to establish connection to GDS catalog
GSM alert log, /u01/app/oracle/diag/gsm/shard1/sharddgdirector/trace/alert_gds.log
GSM-40112: OCI error. Code (-1). See GSMOCI trace for details.
GSM-40122: OCI Catalog Error. Code: 12514. Message: ORA-12514: TNS:listener does not
currently know of service requested in connect descriptor
GSM-40112: OCI error. Code (-1). See GSMOCI trace for details.
2017-04-20T22:50:22.496362+05:30
Process 1 in GSM instance is down
GSM shutdown is successful
GSM shutdown is in progress
NOTE : if not message displayed in the GSM log then enable GSM trace level to 16
while adding GSM itself.
-
Remove the newly created shard director (GSM) that failed to start.
GDSCTL> remove gsm -gsm shardDGdirector
-
Add the shard director using trace level 16.
GDSCTL> add gsm -gsm shardDGdirector -listener port_num -pwd gsmcatuser_password -catalog hostname:port_num:shard_catalog_name -region region1 -trace_level 16
-
If the shard catalog database is running on a non-default port (other than 1521), set the remote listener.
SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp) (HOST=hostname)(PORT=port_num)))';
Tablespace Set Creation Fails
A failure in tablespace set creation may be due to DB_FILES
parameter set too low.
DB_FILES
parameter default setting is 200. This may be too low for
sharded datbases with a large number of shards and chunks. You may also require a larger
number of data files in a Raft replication scenario
To calculate the number of database files created for sharding objects on a given shard:
Sharded database files required = (Number of CREATE
TABLESPACE SET
SQL statements executed using SHARD DDL
) *
(Number of chunks present on the shard + 1)
DB_FILES
must be set to at least the number of files used
by sharding (above) PLUS non-sharding database files (system, sysaux, and so on)
PLUS any extra needed by generic RDBMS code (5); therefore:
DB_FILES
required in each shard = (Number of sharded
database files required, as calculated above) + Number of default database files(6) +
5
To check free DB_FILES
and parameter setting:
SQL> select count(*) from v$datafile;
COUNT(*)
----------
XxxXX
SQL> show parameter db_files
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_files integer 200
Issues Using Deploy Command
GDSCTL> deploy
GSM-45029: SQL error
ORA-29273: HTTP request failed
ORA-06512: at "SYS.DBMS_ISCHED", line 3715
ORA-06512: at "SYS.UTL_HTTP", line 1267
ORA-29276: transfer timeout
ORA-06512: at "SYS.UTL_HTTP", line 651
ORA-06512: at "SYS.UTL_HTTP", line 1257
ORA-06512: at "SYS.DBMS_ISCHED", line 3708
ORA-06512: at "SYS.DBMS_SCHEDULER", line 2609
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_POOLADMIN", line 14284
ORA-06512: at line 1
Solution : Check the $ORACLE_HOME/data/pendingjobs for the exact error. ORA-1017 is thrown if any issues on wallet.
-
On problematic Shard host stop the remote scheduler agent.
schagent -stop
-
rename wallet direcotry on Database home
mv $ORACLE_HOME/data/wallet $ORACLE_HOME/data/wallet.old
-
start the remote scheduler agent and it will create new wallet directory
schagent -start schagent -status echo welcome | schagent -registerdatabase 10.10.10.10 8080
Issues Moving Chunks
If you encounter issues with MOVE CHUNK
, try the
following:
Issue: Initialization parameter
remote_dependencies_mode
has a default value of
timestamp
; therefore, because prvtgwmut.plb
is run
and DBMS_GSM_UTILITY
recompiled durning upgrade, GDSCTL MOVE
CHUNK
runs into ORA-04062 errors similar to the following.
GSM Errors:
server:ORA-03749: Chunk move cannot be performed at this time.
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_DBADMIN", line 5497
ORA-04062: timestamp of package "GSMADMIN_INTERNAL.DBMS_GSM_UTILITY" has been
changed
ORA-06512: at line 1
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_DBADMIN", line 5366
ORA-06512: at line 1 (ngsmoci_execute)
Workaround 1: Restart the source and target shards after upgrade.
Workaround 2:
ALTER SYSTEM SET remote_dependencies_mode=signature
on both source and
target.
Issues with Oracle Database Vault
Do not enable Oracle Database Vault on your sharded databases. Oracle Globally Distributed Database does not support Oracle Database Vault.
Issue During Deployment of Role-Separated Environment
The GSM-45029: SQL ERROR NO MORE DATA TO READ FROM SOCKET
error occurs when you perform administrative operations for Oracle Globally Distributed Database or for Oracle Global Data Services (GDS) and connect through a listener that runs
in the Oracle Real Application Clusters (Oracle RAC) or Oracle Restart account in a
role-separated environment. The error occurs where the Oracle RAC or Oracle Restart
account is different from the Oracle Database account.
Solution:
Start a listener in the Oracle Database account on the sharded catalog database and on each shard, if it is not already running.
The listener can be used to connect and perform administrative operations.
This listener can also be used when you provide an Oracle Database Transparent Network Substrate (TNS) address, when it is required for administrative commands, such as add shard
.