DDL Failure and Recovery Examples

The following examples demonstrate the steps to issue a DDL, monitor its status, and what to do when errors are encountered.

When a DDL fails on a shard, all further DDLs on that shard are blocked until the failure is resolved and the GDSCTL recover shard command is run.

Note that you must have GSM_ADMIN privileges to run these GDSCTL commands.

The following examples demonstrate the case when a DDL is issued using SQL*Plus, but the same status checking and corrective actions apply when using the GDSCTL SQL command.

Example 6-1 A DDL processing error on the shard catalog

In this example the user makes a typo in the CREATE USER command.

SQL> alter session enable shard ddl;
Session altered.

SQL> CREATE USER example_user IDENTRIFIED BY out_standing1;
CREATE USER example_user IDENTRIFIED BY out_Standing1
                   *
ERROR at line 1:
ORA-00922: missing or invalid option

The DDL fails to run on the shard catalog and, as expected, the GDSCTL show ddl command shows that no DDL was run on any of the shards:

GDSCTL> show ddl
id      DDL Text                         Failed shards 
--      --------                         -------------

Then the user repeats the command with the correct spelling. Note that there is no need to run alter session enable shard ddl again because the same session is used.

SQL> CREATE USER example_user IDENTIFIED BY out_Standing1;
User created.

Now show ddl shows that the DDL has been successfully run on the shard catalog database and it did not fail on any shards that are online.

GDSCTL> show ddl
id      DDL Text                                     Failed shards 
--      --------                                     ------------- 
1       create user example_user identified by *****

Note:

For any shard that is down at the time of the DDL processing, the DDL is automatically applied when the shard is back up.

Example 6-2 Recovery from an error on a shard by issuing a corrective action on that shard

In this example, the user attempts to create a tablespace set for system-managed sharded tables. But the data file directory on one of the shards is not writable, so the DDL is successfully run on the catalog, but fails on the shard.

SQL> connect example_user/out_Standing1
Connected

SQL> create tablespace set tbsset;
Tablespace created.

Note that there is no need to run alter session enable shard ddl because the user example_user was created as the sharded database user and shard ddl is enabled by default.

Check status using GDSCTL show ddl:

GDSCTL> show ddl
id      DDL Text                                      Failed shards 
--      --------                                      ------------- 
1       create user example_user identified by *****
2       create tablespace set tbsset                  shard01 

The command output shows that the DDL failed on the shard shard01. Run the GDSCTL config shard command to get detailed information:

GDSCTL> config shard -shard shard01

Conversion = ':'Name: shard01
Shard Group: dbs1
Status: Ok
State: Deployed
Region: east
Connection string: (DESCRIPTION=(ADDRESS=(HOST=shard01-host)(PORT=1521)(PROTOCOL=tcp))
(CONNECT_DATA=(SID=shard01)))
SCAN address: 
ONS remote port: 0
Disk Threshold, ms: 20
CPU Threshold, %: 75
Version: 18.0.0.0
Failed DDL: create tablespace set tbsset
DDL Error: ORA-02585: create tablepsace set failure, one of its tablespaces not created
ORA-01119: error in creating database file \'/ade/b/3667445372/oracle/rdbms/dbs/
SHARD01/datafile/o1_mf_tbsset_%u_.dbf\'
ORA-27040: file create error, unable to create file
Linux-x86_64 Error: 13: Permission denied
Additional information: 1 \(ngsmoci_execute\) 
Failed DDL id: 2
Availability: ONLINE

The text beginning with “Failed DDL:” indicates the problem. To resolve it, the user must log in to the shard database host and make the directory writable.

Display the permissions on the directory:

cd $ORACLE_HOME/rdbms/dbs 
 ls –l ../ | grep dbs
dr-xr-xr-x  4 oracle dba    102400 Jul 20 15:41 dbs/

Change the directory to writable:

chmod +w .
ls –l ../ | grep dbs
drwxrwxr-x  4 oracle dba    102400 Jul 20 15:41 dbs/

Go back to the GDSCTL console and issue the recover shard command:

GDSCTL> recover shard -shard shard01

Check the status again:

GDSCTL> show ddl
id      DDL Text                                      Failed shards 
--      --------                                      ------------- 
1       create user example_user identified by *****
2       create tablespace set tbsset

GDSCTL> config shard -shard shard01

Conversion = ':'Name: shard01
Shard Group: dbs1
Status: Ok
State: Deployed
Region: east
Connection string: (DESCRIPTION=(ADDRESS=(HOST=shard01-host)(PORT=1521)(PROTOCOL=tcp))
(CONNECT_DATA=(SID=shard01)))
SCAN address: 
ONS remote port: 0
Disk Threshold, ms: 20
CPU Threshold, %: 75
Version: 18.0.0.0
Last Failed DDL: 
DDL Error: ---
DDL id: 
Availability: ONLINE

As shown above, the failed DDL error no longer appears.

Example 6-3 Recovery from an error on a shard by issuing a corrective action on all other shards

In this example, the user attempts to create another tablespace set, tbs_set, but the DDL fails on a shard because there is already an existing local tablespace with the same name.

On the shard catalog:

SQL> create tablespace set tbs_set;
Tablespace created.

Check status using the GDSCTL show ddl command:

GDSCTL> show ddl
id      DDL Text                                      Failed shards 
--      --------                                      ------------- 
1       create user example_user identified by *****
2       create tablespace set tbsset 
3       create tablespace set tbs_set                 shard01  

GDSCTL> config shard -shard shard01
Conversion = ':'Name: shard01
……
Failed DDL: create tablespace set tbs_set
DDL Error: ORA-02585: create tablespace set failure, one of its tablespaces not created
ORA-01543: tablespace \'TBS_SET\' already exists \(ngsmoci_execute\)

A solution to this problem is to login to shard01 as a local database administrator, drop the tablespace TBS_SET, and then run GDSCTL recover shard -shard shard01. But suppose you want to keep this tablespace, and instead choose to drop the newly created tablespace set that has the name conflict and create another tablespace set with a different name, such as tbsset2. The following example shows how to do that on the shard catalog:

SQL> drop tablespace set tbs_set;
SQL> create tablespace set tbs_set2;

Check status using GDSCTL:

GDSCTL> show ddl
id      DDL Text                                      Failed shards 
--      --------                                      ------------- 
1       create user example_user identified by *****
2       create tablespace set tbsset             
3       create tablespace set tbs_set                 shard01  
4       drop tablespace set tbs_set
5       create tablespace set tbsset2 

You can see that DDLs 4 and 5 are not attempted on shard01 because DDL 3 failed there. To make this shard consistent with the shard catalog, you must run the GDSCTL recover shard command. However, it does not make sense to run DDL 3 on this shard because it will fail again and you actually do not want to create tablespace set tbs_set anymore. To skip DDL 3 run recover shard with the –ignore_first option:

GDSCTL> recover shard -shard shard01 –ignore_first
GSM Errors: dbs1 shard01:ORA-00959: tablespace \'TBS_SET\' does not exist
 (ngsmoci_execute)

GDSCTL> show ddl
id      DDL Text                                Failed shards 
--      --------                                ------------- 
1       create user sidney identified by *****
2       create tablespace set tbsset             
3       create tablespace set tbs_set            
4       drop tablespace set tbs_set             shard01  
5       create tablespace set tbsset2 

There is no failure with DDL 3 this time because it was skipped. However, the next DDL (4 - drop tablespace set tbs_set) was applied and resulted in the error because the tablespace set to be dropped does not exist on the shard.

Because the –ignore_first option only skips the first DDL, you need to run recover shard again to skip the drop statement as well:

GDSCTL> recover shard -shard shard01 –ignore_first

GDSCTL> show ddl
id      DDL Text                                Failed shards 
--      --------                                -------------
1       create user sidney identified by *****
2       create tablespace set tbsset             
3       create tablespace set tbs_set          
4       drop tablespace set tbs_set
5       create tablespace set tbsset2 

Note that there are no longer any failures shown, and all of the DDLs were applied successfully on the shards.

When recover shard is run with the –ignore_first option, the failed DDL is marked to be ignored during incremental deployment. Therefore, DDL numbers 3 and 4 are skipped when a new shard is added to the sharded database, and only DDL numbers 1, 2, and 5 are applied.