This topic does not apply to Oracle Big Data SQL Cloud Service.
Note:
This section contains diagnostic information that is relevant to on-premises Oracle Big Data SQL. These details may be of interest to Oracle Big Data SQL Cloud Service users, but they are not needed in order to use the service.
On the Oracle Database server, you can use the Big Data SQL Diagnostics Collection tool, bdschecksw
, to do a simple sanity test of Oracle Big Data SQL. This script gathers and analyzes diagnostic information about the Oracle Big Data SQL installation from both the Oracle Database and the Hadoop cluster sides of the installation. The script is in $ORACLE_HOME/bin.
Syntax
bdschecksw Required_Params [Options]
The table below describes the requireed and optional parameters used with bdschecksw
.
Table C-1 bdschecksw Parameters
Parameter | Description | Required or Optional |
---|---|---|
-h, --help |
Display command help and exit. | Optional |
-d, --dbhome ORACLE_HOME |
The path to the Oracle installation on the Oracle Database server. | Required only if the ORACLE_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed. |
-s, --sid=[ORACLE_SID] |
The SID of the Oracle Database server. | Required only if the ORACLE_SID environment variable is not defined on the Oracle Database server where bdschecksw is executed. |
-g, --gihome=Grid_Infrastructure_home Oracle_Database_node_IP_address |
The Grid Infrastructure path on the Oracle Database server. | Required only if the GI_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed. |
-q, --sqlplus Oracle_Database_node_IP_address username |
SQLPlus username on the Oracle Database server. The user is prompted for the password. | Required. |
-c, --cell DNS short name [...n] |
The Hadoop cluster cell nodes. Use DNS short names (FQDN minus domain name) with a space as a delimiter. IP addresses are not recommended, because a node may exist on multiple networks. | Required for Hortonworks HDP only. |
-u, --uname Hadoop_cluster_node_username |
Credentials to run remote commands on the Hadoop cluster from the Oracle Database server. This is usually the oracle user. |
The username and password are always required. |
-p,--pdb=PDB_CONTAINER |
The Pluggable Database container name on the Oracle Database server. | Required only if the Pluggable Database (PDB) container is configured on the Oracle Database server where bdschecksw is executed. |
-k, --key SSH_identity_key_file |
Path to an SSH (Secure Shell) key file. | The optional SSH identity (or key) file is used on top of -u and -p to selects a file from which the identity (private key) for public key authentication is read. |
-r, --cluster Hadoop_cluster_name |
Hadoop cluster name. | Optional. |
-t, --trace |
Turn on extproc and log4j tracing during test execution. | Optional. |
-f, --file=file_path |
Redirect output to the file. | Optional. |
-i, --interactive |
Enter command line arguments interactively (rather than all at once). | Optional. |
-v, --verbose |
Verbose reporting mode. | Optional. (Recommended for full details in the report.) |
Exit Status
The bdschecksw
script returns one of the following status codes.
Table C-2 Status Codes for bdschecksw
Status | Description |
---|---|
0 | Success |
1 | Minor problem (for example, no response in interactive mode). |
2 | Significant problem (for example, an invalid command line argument. |
Example
$ ./bdschecksw -d /u03/app/oracle/product/12.1.0/dbhome_1 -s orcl -p pdborcl -g /u03/app/oracle/product/12.1.0/grid -q sys -u oracle -v
Here is an all-around series of basic checks to ensure that Oracle Big Data SQL is working.
On the Oracle Database server, source the environment using the hadoop_<hcluster>.env
file in $ORACLE_HOME/bigdatasql
.
kinit
as the oracle
Linux user on the Oracle Database server. If possible, also kinit
on each of the Big Data SQL datanodes as the oracle
user.
Note:
You can run this test without runningkinit
on the datanodes, but then offloading in the test will not work. You will eventually need to kinit
on the datanodes in order to verify that offloading is working.Create a text file and add several of lines of random text.
Copy the text file into hdfs as /user/oracle/test.txt
.
$ hadoop fs -put test.txt /user/oracle/test.txt
Define an Oracle external table of type ORACLE_HDFS
:
CREATE TABLE bds_test (line VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HDFS DEFAULT DIRECTORY DEFAULT_DIR LOCATION ('/user/oracle/test.txt') ) REJECT LIMIT UNLIMITED;
Select * from bds_test;
select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic#;
Define a Hive table:
Connect to Hive via Hue, the Hive/Beeline command line, or using Oracle SQL Developer with a Hive JDBC driver.
CREATE TABLE bds_test_hive (line string);
LOAD DATA INPATH '/user/oracle/test.txt' OVERWRITE INTO TABLE bds_test_hive;
Define an external ORACLE_HIVE
table:
CREATE TABLE bds_test_hive (line VARCHAR2(4000)) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ACCESS PARAMETERS (com.oracle.bigdata.tablename=default.bds_test_hive) ) REJECT LIMIT UNLIMITED;
Table C-3 Database Objects in Big Data SQL
Type | Object |
---|---|
Directories |
|
Database Links (public) These allow Big Data SQL to reach the MTA (multi-threaded agent) |
|
Data Dictionary Views |
|
Functions and Procedures for Hive Data Dictionary See |
|
Tables |
SYS.HIVE_URI$ – security table for non-DBA users. |
Statistics |
|
Table C-4 $ORACLE_HOME/bigdatasql Directory
Subdirectory or Filename | Description of Contents |
---|---|
|
|
|
This directory is usually empty. |
|
Contains Java log files from the two type of extprocs. Use the PID that is part of the filename to identify which extproc you are looking at (only one log file will have the PID of the currently running |
|
You can source this file and then run |
Table C-5 External Procedure Agents
Agent | Description |
---|---|
|
Runs the external procedure code used by the
|
|
The BDS Multi-threaded Agent that is used when querying external tables. This is started/stopped by Oracle Clusterware which in turn runs the If you don’t have this
extprocbds_<dbname>_<hcluster> running, then you probably didn’t run bds_exa_install.sh on every database server in your RAC cluster. The C portion’s log files are in $ORACLE_HOME/hs/log (but you need to edit $ORACLE_HOME/hs/admin/initbds – add TRACE_LEVEL=ON and then restart to see logging). The JVM portion’s log files go into $ORACLE_HOME/bigdatasql/log (but you need to setup bigdata-log4j.properties and restart). This is the recommended way to restart (although the quicker way is to run kill -9 on the process):
$ crsctl stop resource bds_<dbname>_<hcluster> $ crsctl start resource bds_<dbname>_<hcluster> |
Note:
Both of External Procedures in the table above make callbacks to the database, which can be blocked by the use of the Secure External Password Store feature. If you use Secure External Password Store (SQLNET.WALLET_OVERRIDE=TRUE
), see Document 2126903.1 in My Oracle Support.Table C-6 Log Files and Trace Files
Directory or Filename | Description |
---|---|
|
Contains log files from Java code run via the |
|
Contains log files from the C code of the |
Database |
Contains log files from the database session. These can yield good information.
|
|
Contains If you want to get additional crsctl stop has export CELLCLIENT_TRACE_LEVEL="all,4" export CELLCLIENT_AUTOFLUSH_LEVEL="all,4" crsctl start has |
|
Record the IP address and subnet range for the database server. For Oracle Big Data SQL on commodity servers, this file also includes parameters which switch the protocol away from InfiniBand RDS to TCP/UDP ( |
Kerberos files: |
If your Hadoop cluster uses Kerberos, you’ll need to setup Kerberos on the database and have a mechanism (such as crontab) to keep a valid Kerberos ticket at all times for the |
The table below identifies objects on the Hadoop server that can provide helpful information for troubleshooting Big Data SQL.
Table C-7 Hadoop-side Datanode Artifacts That are Useful for Troubleshooting
Datanode Artifact | Description |
---|---|
|
|
Log files |
|
Other datanode artifacts |
|
User issues a SELECT query involving an Oracle Big Data SQL external table.
Database sees that one of the objects in the SQL is an External table of type ORACLE_HIVE
Database identifies the cluster name from the com.oracle.bigdata.cluster
parameter on the External table definition else uses the default cluster.
Database identifies the Hive table name from the com.oracle.bigdata.tablename
parameter, else assumes the Hive table name is the same as the Oracle table name.
Database knows that the ORACLE_HIVE External table implementation uses an external procedure which is invoked through the extprocbds_<dbname>_<hcluster>
multi-threaded agent.
Note:
The first phase of the query requires getting the Hive metadata. If you get an error during this first phase, you’ll likely see an error that begins as follows. Notice the “OPEN” inODCIEXTTABLEOPEN
)
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
Database uses the public database link BDSQL$_DEFAULT_CLUSTER
or BDSQL$_<hcluster>
to find the connect string to ask the listener to connect the database session to a thread of the extprocbds_dbname>_hcluster> multi-threaded agent
extprocbds_<dbname>_<hcluster>
was previously started by Oracle Clusterware and is using configuration information from the $ORACLE_HOME/bigdatasql/bigdata_config
directory.
extprocbds_<dbname>_<hcluster>
has spawned a JVM running Hadoop client libraries using the above configuration information. The Hadoop client libraries were copied from the Oracle Big Data Appliance to the Oracle Database server when you ran the bds-exa-install.sh
script.
extprocbds_<dbname>_<hcluster>
uses its JVM and the Hive metastore client library to call the Hive metastore (using a URL such as thrift://hostname>:9083
) to get metadata (columns, inputformat, serde, other table properties) for the Hive table.
At this point, if the Hive metastore is protected by Kerberos authentication, the Hive client libraries running in the extprocbds
JVM on the Oracle Database server will try to send the local Kerberos ticket to the Hive server. This will be the ticket owned by the oracle
Linux user account who is running the database
extprocbds_<dbname>_<hcluster>
calls the Hive metastore to get a list of input paths that hold the data behind the Hive table.
extprocbds_<dbname>_<hcluster>
converts the list of input paths into a list of splits/blocks using Hadoop MapReduce libraries and logic. Then it asks the HDFS namenode for the location (including replicas) of all of the splits /blocks.
Again, if HDFS is protected by Kerberos, the Kerberos ticket from the oracle
Linux user account on the database will be need to be used.
If compression is used, at this point the JVM might have to load specific compression Java or native libraries. If these are non-standard libraries, you will need to install them on both the Oracle Database server and the Hadoop side. For instance, LZO compression requires an additional install and configuration performed on both the database-side on the Hadoop-side.
At this point, the “description” phase is done and the database knows the structure of the Hive table as well as the location of all of the blocks of data (including replicas). This information is also known as the metadata payload. We now begin the “fetch” phase.
The database intelligent storage layer, KCFIS (Kernel Cache File Intelligent Storage), which is also used on Oracle Exadata systems, compares the hostnames of where the blocks of data are stored to a list of active BDSQL server hosts being maintained by the Grid’s diskmon process. (You can see diskmon’s list of BDSQL server hosts in V$CELL).
Note:
The second phase of the query requires fetching the data. If you get an error during this second phase, you’ll likely see an error that begins as folllows. Notice the “FETCH” inODCIEXTTABLEFETCH
) :
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
Assuming that the list of datanode hostnames matches the list of BDSQL hostnames, the database sends a list of local blocks (also called Granules) to each of the BDSQL servers. The database also sends the BDSQL servers metadata about the table, columns, and structure it is accessing. It does this in parallel and asynchronously for performance
Note:
The database statistics “cell XT granules requested for predicate offload” and “cell XT granule bytes requested for predicate offload
” are updated at this pointThe BDSQL process running on the data nodes checks the SQL_ID
against its local list of quarantined SQL_ID
s. If the SQL_ID
matches the quarantine list, then the BDSQL process on the datanode will return an error. However, the user should not see this error. Instead, the database will first try another cell, then try to do the work itself. (See Steps 15 and 16).
Assuming the SQL_ID
is not quarantined by the BDSQL process on the datanode, the BDSQL process will do its SmartScan work against the list of blocks/granules sent to it.
Tip:
See the blog entry Big Data SQL Quick Start. Storage Indexes - Part10 in The Data Warehouse Insider for details about Storage Indexes and other aspects of SmartScan processing.The BDSQL offload process has previously read its configuration information from /opt/oracle/bigdatasql/bdcell-12.1/bigdata.properties
.
The BDSQL process has previously loaded a JVM based on the properties defined in the above configuration.
If the Hive table has special InputFormat or Serde classes, the JVM will load those classes assuming it can find them based on the classpath defined in the above configuration. For some common InputFormats (such as delimited text), Oracle has written C code that can handle those formats faster than regular Java code.
If Kerberos authentication is used, then the BDSQL’s JVM will send its local Kerberos ticket to the HDFS datanode process. This is the Kerberos ticket associated with the oracle
Linux user on the datanode where BDSQL is running.
If Sentry authorization is used, the oracle
Linux user’s Kerberos ticket’s identity needs to have been granted access to the Hive table and underlying HDFS data.
The BDSQL server will update statistics like “cell XT granule IO bytes saved by StorageIndex” as it runs.
The database kcfis layer will collect results as they are returned from the BDSQL processes on the datanodes and send the next batch of blocks/granules to the BDSQL processes.
The database will update the “cell interconnect bytes returned by XT smart scan
” statistic as bytes are returned
If there are issues with a BDSQL process for a given block, the database will try to send the work to a different BDSQL process (it will pick a location that has a replica of the block that failed).
The database will update the “cell XT granule predicate offload retries
” statistic.
If the database is unable to get the BDSQL processes to successfully offload a block even after retrying, then the database will “fallback” and have the JVM in the extprocbds_<db>_<cluster>
do the work.
This will be slower as the raw data will need to be moved to the database server for processing.
If the Hive table has any special InputFormats or Serdes, the extprocbds_<db>_<cluster>
’s JVM will need to load them based on the classpath configuration defined on the database’s bigdata.properties
file.
The results from the external table data source continue to be collected until all input paths/blocks/granules are handled.
User queries one of the Oracle Big Data SQL data dictionary views, such as all_hive_tables
.
In Oracle Big Data SQL 2.0 and earlier, if this was the user_hive_*
view and the user was not a DBA, then the user needed to be listed in the SYS.HIVE_URI$
table. Oracle Big Data SQL 3.0 removed the HIVE_URI$
check.
The view accesses the GetHiveTable
pl/sql pipelined table function.
The GetHiveTable
function is implemented by the HiveMetadata
type which is implemented as an external procedure using the SYS.DBMSHADOOPLIB
library.
The Oracle Database spawns a new instance of the “extproc” for this database session. The extproc reads the $ORACLE_HOME/hs/admin/extproc.ora
file for settings.
You can set TRACE_LEVEL=ON
for tracing of the C code. Log file will be written to $ORACLE_HOME/hs/log
.
By default, there may be an error in the extproc.ora
, causing an “Error in assignment statement
” message in the log . The statement “SET EXTPROC_DLLS=
” (with no value after the equal sign) is not valid. Comment this line out if you want to use TRACE_LEVEL=ON
.
The extproc attaches the libkubsagt.so
library (as in SYS.DBMSHADOOPLIB
).
Libkubsagt12.so
initiates a JVM and loads the HiveMetadata.jar
.
The JVM uses the configuration information in $ORACLE_HOME/bigdatasql/bigdata_config/
to identify the list of clusters and their Hive metastore connection information.
Logging for the JVM is based on $ORACLE_HOME/bigdatasql/bigdata_config/bigdata-log4j.properties
. Log files will be written to $ORACLE_HOME/bigdatasql/log
. There will be a new log file for each database session.
The Java code in HiveMetadata.jar
uses the Hive metastore client libraries to connect to the Hive metastore to retrieve data about all of the databases and all of the tables.
If the Hive metastore is protected by Kerberos, the JVM will try to send the Kerberos ticket of the oracle
Linux user who is running the database
The Java code returns the request data back to the database.
Restarting the extprocbds_<db>_<hcluster>
:
$ crsctl stop res bds_<dbname>_<hcluster>
Quick way, but not the best way: kill the extprocbds_*
process and wait for it to come back
Restarting the extproc.
This begins a new database session.
Restarting the Oracle Big Data SQL software on the datanodes:
Use Cloudera Manager or the Ambari Web UI.
Quick way, but not the best way: kill the bdsqloflsrv
process and wait for it to come back.
Command line method on an Oracle Big Data Appliance (logged on as root
on node1):
$ bdacli stop big_data_sql_cluster $ bdacli start big_data_sql_cluster
Checking for Oracle Big Data SQL quarantines on a single datanode:
$ bdscli -e list quarantine detail
To check for quarantines on all datanodes:
$ dcli -g cells.lst bdscli -e list quarantine detail
Clearing Oracle Big Data SQL quarantines on a single datanode:
$ bdscli -e drop quarantine all
To clear quarantines on all datanodes:
$ dcli -g cells.lst bdscli -e drop quarantine all
Checking statistics for proper offloading:
Use the Sql Monitor hint: /*+ MONITOR*/
.
Query XT statistics. Ensure that “retries” is zero and “bytes returned” is greater than zero.
Looking for log files on the datanodes:
Clear quarantines on all datanodes
Set Log
property in /opt/oracle/bigdatasql/bdcell-12.1/bigdata-log4j.properties
on datanodes.
Restart bdsqloflsrv
on datanodes.
Cd to the log file directory: /opt/oracle/bigdatasql/bdcell-12.1/log
.
tail -f bigdata-log4j.log
Ensure that your query has data on the node you are looking at (i.e. your query should need to access files with many blocks. If you only touch a small number of blocks, the result may be that your datanode is not be asked to do any work)
Make a new database session (to reset XT statistics) and Run query.
Tip:
Use the/*+MONITOR*/ hint
if you want to be sure to see it in SQL Monitor.You should see new entries in the datanode’s bigdata-log4j.log
.
On the Oracle Database server, query XT statistics and check that retries=0
and bytes returned>0
.
Looking for log files on the database:
Clear quarantines on all data nodes
Make a new database session (to reset XT statistics)
select host_name from v$instance;
Log in to that instance’s database server at the Linux level.
Set log properties in $ORACLE_HOME/bigdatasql/bigdata-log4j.properties
.
Restart extprocbds_<db>_<hcluster>
on that instance to pick up the log property changes
Turn on XT tracing:
This command turns on external table logging:
alter session set "_xt_trace"="low","compilation","execution";
This command adds additional tracing:
alter session set events 'trace[KCFIS] disk high, memory high';
Run the query.
Tip:
Use the/*+ MONITOR */ hint
if you want to be sure to see it in SQL Monitor.Query XT statistics and see if retries=0
and bytes returned>0
.
select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic# ;
Look at JVM log file: $ORACLE_HOME/bigdatasql
. (Look for the one with the same PID as the extprocbds_*
process.)
Look at database trace_file
:
select value from v$diag_info WHERE name = 'Default Trace File';
You can add JVM properties to the bigdata.properties file as shown below. This can be good for hard-to-spot low-level Kerberos issues.
java.options=-Dsun.security.krb5.debug=true
The extproc
and extprocbds_<dbname>_<hcluster>
processes run the JVMs on the database and the bdsqloflsrv
process runs the JVM on the datanode. You can see this by running the “jps” command:
$ORACLE_HOME/bigdatasql/jdk*/bin/jps
If you are very comfortable with your Java skills, you can also use Oracle JVisualVM or Oracle JConsole to connect to the JVMs.
Patch and Datapatch errors can have a number of different causes and effects. One thing you can do is check to ensure that the expected patches are loaded.
If you see "wrong number or types of arguments in a call to 'FETCH_OPEN' in the error stack
Here is an example of an error stack that may warrant a query of DBA_REGISTRY_SQLPATCH
to determine if the correct patches are loaded:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout ORA-29400: data cartridge error ORA-06550: line 1, column 25: PLS-00306: wrong number or types of arguments in call to 'FETCH_OPEN' ORA-06550: line 1, column 14:PL/SQL: Statement ignored
This may indicate one of several problems.
A Bundle Patch was not applied correctly
Datapatch did not run and therefore a patch required by the installed version of Oracle Big Data SQL is not installed
In this case, run the following query to determine if patches identified as requirements for this installation in theOracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) have been applied.
select PATCH_ID, PATCH_UID, VERSION, STATUS, DESCRIPTION from DBA_REGISTRY_SQLPATCH order by BUNDLE_SERIES
For example, for Oracle Big Data SQL 3.0.1 with BP 12.1.0.2.160419 (22806133), the query should return these results.
PATCH_ID PATCH_UID VERSION STATUS DESCRIPTION ---------- ---------- ------- ---- ------- -- ------------ 22806133 19983161 12.1.0.2 SUCCESS DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133)
If the query fails or the correct patch for the installed bundle is not found, see 1609718.1 in My Oracle Support for more information about issues with Datapatch.