D Diagnostic Tips and Details
The following is a collection of notes that can be useful for troubleshooting and for general understanding of Oracle Big Data SQL.
D.1 Running Diagnostics with bdschecksw
On the Oracle Database server, you can use the Big Data SQL Diagnostics Collection tool, bdschecksw
, to do a simple sanity test of Oracle Big Data SQL. This script gathers and analyzes diagnostic information about the Oracle Big Data SQL installation from both the Oracle Database and the Hadoop cluster sides of the installation. The script is in $ORACLE_HOME/bin
on the database server.
Syntax
You can run this diagnostic check manually at any time. At installation time, it is also run by bds-database-install.sh
, the database-side installer.
bdschecksw Required_Params [Options]
The table below describes the required and optional parameters used with bdschecksw
.
Table D-1 bdschecksw Parameters
Parameter | Description | Required or Optional |
---|---|---|
-h, --help |
Display command help and exit. | Optional |
-d, --dbhome ORACLE_HOME |
The path to the Oracle installation on the Oracle Database server. | Required only if the ORACLE_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed.
|
-s, --sid=[ORACLE_SID] |
The SID of the Oracle Database server. | Required only if the ORACLE_SID environment variable is not defined on the Oracle Database server where bdschecksw is executed.
|
-g, --gihome=Grid_Infrastructure_home Oracle_Database_node_IP_address |
The Grid Infrastructure path on the Oracle Database server. | Required only if the GI_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed.
|
-y, --giuser Oracle_Database_node_IP_address Grid_Infrastructure_home |
GI_HOME administrator name or owner (OS user name) of GI_HOME. |
|
-q, --sqlplus Oracle_Database_node_IP_address username |
SQLPlus username on the Oracle Database server. The user is prompted for the password. | Required. |
-c, --cell DNS short name [...n] |
The Hadoop cluster cell nodes. Use DNS short names (FQDN minus domain name) with a space as a delimiter. IP addresses are not recommended, because a node may exist on multiple networks. | Required for Hortonworks HDP only. |
-u, --uname Hadoop_cluster_node_username |
Credentials to run remote commands on the Hadoop cluster from the Oracle Database server. This is usually the oracle user.
|
The username and password are always required. |
-p,--pdb=PDB_CONTAINER |
The Pluggable Database container name on the Oracle Database server. | Required only if the Pluggable Database (PDB) container is configured on the Oracle Database server where bdschecksw is executed. |
-k, --key SSH_identity_key_file |
Path to an SSH (Secure Shell) key file. | The optional SSH identity (or key) file is used on top of -u and -p to selects a file from which the identity (private key) for public key authentication is read.
|
-r, --cluster Hadoop_cluster_name |
Hadoop cluster name. | Optional. |
-t, --trace |
Turn on extproc and log4j tracing during test execution. | Optional. |
-f, --file=file_path |
Redirect output to the file. | Optional. |
-i, --interactive |
Enter command line arguments interactively (rather than all at once). | Optional. |
-x |
Extensive mode. | Optional. Requires root privilege. |
-v, --verbose |
Verbose reporting mode. | Optional. (Recommended for full details in the report.) |
Exit Status
The bdschecksw
script returns one of the following status codes.
Table D-2 Status Codes for bdschecksw
Status | Description |
---|---|
0 | Success |
1 | Minor problem (for example, no response in interactive mode). |
2 | Significant problem (for example, an invalid command line argument). |
Example
$ ./bdschecksw -d /u03/app/oracle/product/$ORACLE_HOME/dbhome_1 -s orcl -p pdborcl -g /u03/app/oracle/product/$ORACLE_HOME/grid -q sys -u oracle -v
D.2 How to do a Quick Test
Here is an all-around series of basic checks to ensure that Oracle Big Data SQL is working.
-
On the Oracle Database server, source the environment using the
hadoop_<hcluster>.env
file in$ORACLE_HOME/bigdatasql
. -
If Kerberos is enabled,
kinit
as theoracle
Linux user on the Oracle Database server. If possible, alsokinit
on each of the Big Data SQL datanodes as theoracle
user.Note:
You can run this test without runningkinit
on the datanodes, but then offloading in the test will not work. You will eventually need tokinit
on the datanodes in order to verify that offloading is working. -
Create a text file and add several of lines of random text.
-
Copy the text file into hdfs as
/user/oracle/test.txt
.$ hadoop fs -put test.txt /user/oracle/test.txt
-
Define an Oracle external table of type
ORACLE_HDFS
:-
CREATE TABLE bds_test (line VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HDFS DEFAULT DIRECTORY DEFAULT_DIR LOCATION ('/user/oracle/test.txt') ) REJECT LIMIT UNLIMITED;
-
Select * from bds_test;
-
select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic#;
-
-
Define a Hive table:
-
Connect to Hive via Hue, the Hive/Beeline command line, or using Oracle SQL Developer with a Hive JDBC driver.
-
CREATE TABLE bds_test_hive (line string);
-
LOAD DATA INPATH '/user/oracle/test.txt' OVERWRITE INTO TABLE bds_test_hive;
-
-
Define an external
ORACLE_HIVE
table:CREATE TABLE bds_test_hive (line VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.tablename=default.bds_test_hive) ) REJECT LIMIT UNLIMITED;
D.3 Oracle Big Data SQL Database Objects
Familiarity with the various Oracle Big Data SQL database objects can be helpful in troubleshooting.
Table D-3 Database Objects in Big Data SQL
Type | Object |
---|---|
Directories |
|
Database Links (public) These allow Big Data SQL to reach the MTA (multi-threaded agent) |
|
Data Dictionary Views |
|
Functions and Procedures for Hive Data Dictionary See |
|
Tables |
SYS.HIVE_URI$ – security table for non-DBA users.
|
Statistics |
|
D.4 Other Database-Side Artifacts
This section describes directories, files, and external procedure agents on the database side that are relevant to Oracle Big Data SQL.
Table D-4 $ORACLE_HOME/bigdatasql Directory
Subdirectory or Filename | Description of Contents |
---|---|
|
Contains settings related to all clusters installed on this ORACLE_HOME. It includes a subdirectory for each cluster, which contains:
|
|
Contains settings related to all databases running on this ORACLE_HOME. It includes a subdirectory for each database running on this ORACLE_HOME. Each database subdirectory contains a bigdata_config directory, which includes:
Each database subdirectory also includes:
|
|
Soft link to the JDK installation. The version installed by Oracle Big Data SQL is |
|
The Oracle Big Data SQL Java JAR directory |
|
|
|
This directory is usually empty. |
|
Contains Java log files from the two type of extprocs. Use the PID that is part of the filename to identify which extproc you are looking at (only one log file will have the PID of the currently running |
|
Sets the Hadoop client environment. There is one of these |
|
A soft link to the installed cp2hadoop (Copy to Hadoop) toolkit. The version of the toolkit installed by Oracle Big Data SQL 3.2 is |
Table D-5 External Procedure Agents
Agent | Description |
---|---|
|
Runs the external procedure code used by the
|
|
The BDS Multi-threaded Agent that is used when querying external tables. This is started/stopped by Oracle Clusterware which in turn runs the If you don’t have this
extprocbds_<dbname>_<hcluster> running, then you probably didn’t run bds_database_install.sh on every database server in your RAC cluster. The C portion’s log files are in $ORACLE_HOME/hs/log (but you need to edit $ORACLE_HOME/hs/admin/initbds – add TRACE_LEVEL=ON and then restart to see logging). The JVM portion’s log files go into $ORACLE_HOME/bigdatasql/log (but you need to setup bigdata-log4j.properties and restart). This is the recommended way to restart (although the quicker way is to run kill -9 on the process):
|
Note:
Both of External Procedures in the table above make callbacks to the database, which can be blocked by the use of the Secure External Password Store feature. If you use Secure External Password Store (SQLNET.WALLET_OVERRIDE=TRUE
), see Document 2126903.1 in My Oracle Support.
Table D-6 Log Files and Trace Files
Directory or Filename | Description |
---|---|
|
Contains log files from Java code run via the |
|
Contains log files from the C code of the |
Database |
Contains log files from the database session. These can yield good information.
|
|
Contains If you want to get additional
|
|
Record the IP address and subnet range for the database server. For Oracle Big Data SQL on commodity servers, this file also includes parameters which switch the protocol away from InfiniBand RDS to TCP/UDP ( |
Kerberos files: |
If your Hadoop cluster uses Kerberos, you’ll need to setup Kerberos on the database and have a mechanism (such as crontab) to keep a valid Kerberos ticket at all times for the The Oracle Big Data SQL installer now provides a directive in the Jaguar configuration file that will automatically set up a cron job for this on both the Hadoop cluster and the Oracle Database system. See the description of the configuration file in the installation guide. |
D.5 Hadoop Datanode Artifacts
The table below identifies objects on the Hadoop server that can provide helpful information for troubleshooting Big Data SQL.
Table D-7 Hadoop-side Datanode Artifacts That are Useful for Troubleshooting
Datanode Artifact | Description |
---|---|
|
|
Log files |
|
Other datanode artifacts |
|
D.6 Step-by-Step Process for Querying an External Table
This section describes the events that occur during a query of an external table.
-
User issues a SELECT query involving an Oracle Big Data SQL external table.
-
Database sees that one of the objects in the SQL is an External table of type ORACLE_HIVE
-
Database identifies the cluster name from the
com.oracle.bigdata.cluster
parameter on the External table definition else uses the default cluster. -
Database identifies the Hive table name from the
com.oracle.bigdata.tablename
parameter, else assumes the Hive table name is the same as the Oracle table name. -
Database knows that the ORACLE_HIVE External table implementation uses an external procedure which is invoked through the
extprocbds_<dbname>_<hcluster>
multi-threaded agent.Note:
The first phase of the query requires getting the Hive metadata. If you get an error during this first phase, you’ll likely see an error that begins as follows. Notice the “OPEN” inODCIEXTTABLEOPEN
)ORA-29913: error in executing ODCIEXTTABLEOPEN callout
-
Database uses the public database link
BDSQL$_DEFAULT_CLUSTER
orBDSQL$_<hcluster>
to find the connect string to ask the listener to connect the database session to a thread of the extprocbds_dbname>_hcluster> multi-threaded agent-
extprocbds_<dbname>_<hcluster>
was previously started by Oracle Clusterware and is using configuration information from the$ORACLE_HOME/bigdatasql/databases/<database name>/bigdata_config
directory. -
extprocbds_<dbname>_<hcluster>
has spawned a JVM running Hadoop client libraries using the above configuration information. The Hadoop client libraries were copied from the Oracle Big Data Appliance to the Oracle Database server when you ran thebds-exa-install.sh
script.
-
-
extprocbds_<dbname>_<hcluster>
uses its JVM and the Hive metastore client library to call the Hive metastore (using a URL such asthrift://hostname>:9083
) to get metadata (columns, inputformat, serde, other table properties) for the Hive table.-
At this point, if the Hive metastore is protected by Kerberos authentication, the Hive client libraries running in the
extprocbds
JVM on the Oracle Database server will try to send the local Kerberos ticket to the Hive server. This will be the ticket owned by theoracle
Linux user account who is running the database
-
-
extprocbds_<dbname>_<hcluster>
calls the Hive metastore to get a list of input paths that hold the data behind the Hive table. -
extprocbds_<dbname>_<hcluster>
converts the list of input paths into a list of splits/blocks using Hadoop MapReduce libraries and logic. Then it asks the HDFS namenode for the location (including replicas) of all of the splits /blocks.-
Again, if HDFS is protected by Kerberos, the Kerberos ticket from the
oracle
Linux user account on the database will be need to be used. -
If compression is used, at this point the JVM might have to load specific compression Java or native libraries. If these are non-standard libraries, you will need to install them on both the Oracle Database server and the Hadoop side. For instance, LZO compression requires an additional install and configuration performed on both the database-side on the Hadoop-side.
At this point, the “description” phase is done and the database knows the structure of the Hive table as well as the location of all of the blocks of data (including replicas). This information is also known as the metadata payload. We now begin the “fetch” phase.
-
-
The database intelligent storage layer, KCFIS (Kernel Cache File Intelligent Storage), which is also used on Oracle Exadata systems, compares the hostnames of where the blocks of data are stored to a list of active BDSQL server hosts being maintained by the Grid’s diskmon process. (You can see diskmon’s list of BDSQL server hosts in V$CELL).
Note:
The second phase of the query requires fetching the data. If you get an error during this second phase, you’ll likely see an error that begins as folllows. Notice the “FETCH” inODCIEXTTABLEFETCH
) :ORA-29913: error in executing ODCIEXTTABLEFETCH callout
-
Assuming that the list of datanode hostnames matches the list of BDSQL hostnames, the database sends a list of local blocks (also called Granules) to each of the BDSQL servers. The database also sends the BDSQL servers metadata about the table, columns, and structure it is accessing. It does this in parallel and asynchronously for performance
Note:
The database statistics “cell XT granules requested for predicate offload” and “cell XT granule bytes requested for predicate offload
” are updated at this point -
The BDSQL process running on the data nodes checks the
SQL_ID
against its local list of quarantinedSQL_ID
s. If theSQL_ID
matches the quarantine list, then the BDSQL process on the datanode will return an error. However, the user should not see this error. Instead, the database will first try another cell, then try to do the work itself. (See Steps 15 and 16). -
Assuming the
SQL_ID
is not quarantined by the BDSQL process on the datanode, the BDSQL process will do its SmartScan work against the list of blocks/granules sent to it.Tip:
See the blog entry Big Data SQL Quick Start. Storage Indexes - Part10 in The Data Warehouse Insider for details about Storage Indexes and other aspects of SmartScan processing.-
The BDSQL offload process has previously read its configuration information from
/opt/oracle/bigdatasql/bdcell-<cell version>/bigdata.properties
. -
The BDSQL process has previously loaded a JVM based on the properties defined in the above configuration.
-
If the Hive table has special InputFormat or Serde classes, the JVM will load those classes assuming it can find them based on the classpath defined in the above configuration. For some common InputFormats (such as delimited text), Oracle has written C code that can handle those formats faster than regular Java code.
-
If Kerberos authentication is used, then the BDSQL’s JVM will send its local Kerberos ticket to the HDFS datanode process. This is the Kerberos ticket associated with the
oracle
Linux user on the datanode where BDSQL is running. -
If Sentry authorization is used, the
oracle
Linux user’s Kerberos ticket’s identity needs to have been granted access to the Hive table and underlying HDFS data. -
The BDSQL server will update statistics like “cell XT granule IO bytes saved by StorageIndex” as it runs.
-
-
The database kcfis layer will collect results as they are returned from the BDSQL processes on the datanodes and send the next batch of blocks/granules to the BDSQL processes.
-
The database will update the “
cell interconnect bytes returned by XT smart scan
” statistic as bytes are returned
-
-
If there are issues with a BDSQL process for a given block, the database will try to send the work to a different BDSQL process (it will pick a location that has a replica of the block that failed).
-
The database will update the “
cell XT granule predicate offload retries
” statistic.
-
-
If the database is unable to get the BDSQL processes to successfully offload a block even after retrying, then the database will “fallback” and have the JVM in the
extprocbds_<db>_<cluster>
do the work.-
This will be slower as the raw data will need to be moved to the database server for processing.
-
If the Hive table has any special InputFormats or Serdes, the
extprocbds_<db>_<cluster>
’s JVM will need to load them based on the classpath configuration defined on the database’sbigdata.properties
file.
-
-
The results from the external table data source continue to be collected until all input paths/blocks/granules are handled.
D.7 Step-by-Step for a Hive Data Dictionary Query
This section describes the events that occur in a query over a Hive Data Dictionary.
-
User queries one of the Oracle Big Data SQL data dictionary views, such as
all_hive_tables
.In Oracle Big Data SQL 2.0 and earlier, if this was the
user_hive_*
view and the user was not a DBA, then the user needed to be listed in theSYS.HIVE_URI$
table. Oracle Big Data SQL 3.0 removed theHIVE_URI$
check. -
The view accesses the
GetHiveTable
pl/sql pipelined table function. -
The
GetHiveTable
function is implemented by theHiveMetadata
type which is implemented as an external procedure using theSYS.DBMSHADOOPLIB
library. -
The Oracle Database spawns a new instance of the “extproc” for this database session. The extproc reads the
$ORACLE_HOME/hs/admin/extproc.ora
file for settings.You can set
TRACE_LEVEL=ON
for tracing of the C code. Log file will be written to$ORACLE_HOME/hs/log
.By default, there may be an error in the
extproc.ora
, causing an “Error in assignment statement
” message in the log . The statement “SET EXTPROC_DLLS=
” (with no value after the equal sign) is not valid. Comment this line out if you want to useTRACE_LEVEL=ON
. -
The extproc attaches the
libkubsagt.so
library (as inSYS.DBMSHADOOPLIB
). -
Libkubsagt12.so
initiates a JVM and loads theHiveMetadata.jar
.-
The JVM uses the configuration information in
$ORACLE_HOME/bigdatasql/bigdata_config/
to identify the list of clusters and their Hive metastore connection information. -
Logging for the JVM is based on
$ORACLE_HOME/bigdatasql/bigdata_config/bigdata-log4j.properties
. Log files will be written to$ORACLE_HOME/bigdatasql/log
. There will be a new log file for each database session.
-
-
The Java code in
HiveMetadata.jar
uses the Hive metastore client libraries to connect to the Hive metastore to retrieve data about all of the databases and all of the tables.-
If the Hive metastore is protected by Kerberos, the JVM will try to send the Kerberos ticket of the
oracle
Linux user who is running the database
-
-
The Java code returns the request data back to the database.
D.8 Key Adminstration Tasks for Oracle Big Data SQL
These are notes about some important administrative tasks.
-
Restarting the
extprocbds_<db>_<hcluster>
:$ crsctl stop res bds_<dbname>_<hcluster>
Quick way, but not the best way: kill the
extprocbds_*
process and wait for it to come back -
Restarting the extproc.
This begins a new database session.
-
Restarting the Oracle Big Data SQL software on the datanodes:
-
Use Cloudera Manager or the Ambari Web UI.
-
Quick way, but not the best way: kill the
bdsqloflsrv
process and wait for it to come back. -
Command line method on an Oracle Big Data Appliance (logged on as
root
on node1):$ bdacli stop big_data_sql_cluster $ bdacli start big_data_sql_cluster
-
-
Checking for Oracle Big Data SQL quarantines on a single datanode:
$ bdscli -e list quarantine detail
To check for quarantines on all datanodes:
$ dcli -g cells.lst bdscli -e list quarantine detail
-
Clearing Oracle Big Data SQL quarantines on a single datanode:
$ bdscli -e drop quarantine all
To clear quarantines on all DataNodes:
$ dcli -g cells.lst bdscli -e drop quarantine all
-
Checking statistics for proper offloading:
-
Use the Sql Monitor hint:
/*+ MONITOR*/
. -
Query XT statistics. Ensure that “retries” is zero and “bytes returned” is greater than zero.
-
-
Looking for log files on the datanodes:
-
Clear quarantines on all datanodes
-
Set
Log
property in/opt/oracle/bigdatasql/bdcell-12.1/bigdata-log4j.properties
on datanodes. -
Restart
bdsqloflsrv
on datanodes. -
Cd to the log file directory:
/opt/oracle/bigdatasql/bdcell-12.1/log
. -
tail -f bigdata-log4j.log
-
Ensure that your query has data on the node you are looking at (i.e. your query should need to access files with many blocks. If you only touch a small number of blocks, the result may be that your datanode is not be asked to do any work)
-
Make a new database session (to reset XT statistics) and Run query.
Tip:
Use the/*+MONITOR*/ hint
if you want to be sure to see it in SQL Monitor.You should see new entries in the datanode’s
bigdata-log4j.log
. -
On the Oracle Database server, query XT statistics and check that
retries=0
andbytes returned>0
.
-
-
Looking for log files on the database:
-
Clear quarantines on all data nodes
-
Make a new database session (to reset XT statistics)
-
Find out what instance your session is connected to (in case you got load-balanced to a different database server than the one you logged on to):
select host_name from v$instance;
-
Log in to that instance’s database server at the Linux level.
-
Set log properties in
$ORACLE_HOME/bigdatasql/bigdata-log4j.properties
. -
Restart
extprocbds_<db>_<hcluster>
on that instance to pick up the log property changes -
Turn on XT tracing:
This command turns on external table logging:
alter session set "_xt_trace"="low","compilation","execution";
This command adds additional tracing:
alter session set events 'trace[KCFIS] disk high, memory high';
-
Run the query.
Tip:
Use the/*+ MONITOR */ hint
if you want to be sure to see it in SQL Monitor. -
Query XT statistics and see if
retries=0
andbytes returned>0
.select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic# ;
-
Look at JVM log file:
$ORACLE_HOME/bigdatasql
. (Look for the one with the same PID as theextprocbds_*
process.) -
Look at database
trace_file
:select value from v$diag_info WHERE name = 'Default Trace File';
-
D.9 Additional Java Diagnostics
The following are some additional Java diagnostics.
-
You can add JVM properties to the bigdata.properties file as shown below. This can be good for hard-to-spot low-level Kerberos issues.
java.options=-Dsun.security.krb5.debug=true
-
The
extproc
andextprocbds_<dbname>_<hcluster>
processes run the JVMs on the database and thebdsqloflsrv
process runs the JVM on the datanode. You can see this by running the “jps” command:$ORACLE_HOME/bigdatasql/jdk*/bin/jps
-
If you are very comfortable with your Java skills, you can also use Oracle JVisualVM or Oracle JConsole to connect to the JVMs.
D.10 Checking for Correct Oracle Big Data SQL Patches
Patch and Datapatch errors can have a number of different causes and effects. One thing you can do is check to ensure that the expected patches are loaded.
If you see "wrong number or types of arguments in a call to 'FETCH_OPEN' in the error stack
Here is an example of an error stack that may warrant a query of DBA_REGISTRY_SQLPATCH
to determine if the correct patches are loaded:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
ORA-06550: line 1, column 25:
PLS-00306: wrong number or types of arguments in call to 'FETCH_OPEN'
ORA-06550: line 1, column 14:PL/SQL: Statement ignored
This may indicate one of several problems.
-
A Bundle Patch was not applied correctly
-
Datapatch did not run and therefore a patch required by the installed version of Oracle Big Data SQL is not installed
In this case, run the following query to determine if patches identified as requirements for this installation in theOracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) have been applied.
select PATCH_ID, PATCH_UID, VERSION, STATUS, DESCRIPTION from DBA_REGISTRY_SQLPATCH order by BUNDLE_SERIES
For example, for Oracle Big Data SQL 3.0.1 with BP 12.1.0.2.160419 (22806133), the query should return these results.
PATCH_ID PATCH_UID VERSION STATUS DESCRIPTION
---------- ---------- ------- ---- ------- -- ------------
22806133 19983161 12.1.0.2 SUCCESS DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133)
If the query fails or the correct patch for the installed bundle is not found, see 2335899.2 in My Oracle Support for more information about troubleshooting Datapatch.
D.11 Debugging SQL.NET Issues
The following suggestions help to solve possible SQL.NET issues.
Misconfiguration of SQL.NET can result in blocked external procedure calls. If execution of these calls return ORA errors such as "ORA-28579: network error during callback from external procedure agent
," then check the following My Oracle Support note and confirm that your configuration is correct.
Setting up Oracle Big Data SQL and Oracle Secure External Password Store (2126903.1)
My Oracle Support note 1598247.1 describes the symptoms of the problem.