C Diagnostic Tips and Details

Not Big Data SQL Cloud Service Topic This topic does not apply to Oracle Big Data SQL Cloud Service.

Note:

This section contains diagnostic information that is relevant to on-premises Oracle Big Data SQL. These details may be of interest to Oracle Big Data SQL Cloud Service users, but they are not needed in order to use the service.

C.1 Running Diagnostics with bdschecksw

On the Oracle Database server, you can use the Big Data SQL Diagnostics Collection tool, bdschecksw, to do a simple sanity test of Oracle Big Data SQL. This script gathers and analyzes diagnostic information about the Oracle Big Data SQL installation from both the Oracle Database and the Hadoop cluster sides of the installation. The script is in $ORACLE_HOME/bin.

Syntax

bdschecksw Required_Params [Options]

The table below describes the requireed and optional parameters used with bdschecksw.

Table C-1 bdschecksw Parameters

Parameter Description Required or Optional
-h, --help Display command help and exit. Optional
-d, --dbhome ORACLE_HOME The path to the Oracle installation on the Oracle Database server. Required only if the ORACLE_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed.
 -s, --sid=[ORACLE_SID] The SID of the Oracle Database server. Required only if the ORACLE_SID environment variable is not defined on the Oracle Database server where bdschecksw is executed.
 -g, --gihome=Grid_Infrastructure_home Oracle_Database_node_IP_address The Grid Infrastructure path on the Oracle Database server. Required only if the GI_HOME environment variable is not defined on the Oracle Database server where bdschecksw is executed.
-q, --sqlplus Oracle_Database_node_IP_address username SQLPlus username on the Oracle Database server. The user is prompted for the password. Required.
-c, --cell DNS short name [...n] The Hadoop cluster cell nodes. Use DNS short names (FQDN minus domain name) with a space as a delimiter. IP addresses are not recommended, because a node may exist on multiple networks. Required for Hortonworks HDP only.
-u, --uname Hadoop_cluster_node_username Credentials to run remote commands on the Hadoop cluster from the Oracle Database server. This is usually the oracle user. The username and password are always required.
-p,--pdb=PDB_CONTAINER The Pluggable Database container name on the Oracle Database server. Required only if the Pluggable Database (PDB) container is configured on the Oracle Database server where bdschecksw is executed.
  -k, --key SSH_identity_key_file Path to an SSH (Secure Shell) key file. The optional SSH identity (or key) file is used on top of -u and -p to selects a file from which the identity (private key) for public key authentication is read.
  -r, --cluster Hadoop_cluster_name Hadoop cluster name. Optional.
-t, --trace Turn on extproc and log4j tracing during test execution. Optional.
  -f, --file=file_path Redirect output to the file. Optional.
  -i, --interactive Enter command line arguments interactively (rather than all at once). Optional.
  -v, --verbose Verbose reporting mode. Optional. (Recommended for full details in the report.)

Exit Status

The bdschecksw script returns one of the following status codes.

Table C-2 Status Codes for bdschecksw

Status Description
0 Success
1 Minor problem (for example, no response in interactive mode).
2 Significant problem (for example, an invalid command line argument.

Example

$ ./bdschecksw -d /u03/app/oracle/product/12.1.0/dbhome_1 -s orcl -p pdborcl -g /u03/app/oracle/product/12.1.0/grid -q sys -u oracle -v

C.2 How to do a Quick Test

Here is an all-around series of basic checks to ensure that Oracle Big Data SQL is working.

  1. On the Oracle Database server, source the environment using the hadoop_<hcluster>.env file in $ORACLE_HOME/bigdatasql.

  2. If Kerberos is enabled, kinit as the oracle Linux user on the Oracle Database server. If possible, also kinit on each of the Big Data SQL datanodes as the oracle user.

    Note:

    You can run this test without running kinit on the datanodes, but then offloading in the test will not work. You will eventually need to kinit on the datanodes in order to verify that offloading is working.
  3. Create a text file and add several of lines of random text.

  4. Copy the text file into hdfs as /user/oracle/test.txt.

    $ hadoop fs -put test.txt /user/oracle/test.txt
    
  5. Define an Oracle external table of type ORACLE_HDFS:

    1. CREATE TABLE bds_test (line VARCHAR2(4000)) 
        ORGANIZATION EXTERNAL
           ( TYPE ORACLE_HDFS DEFAULT DIRECTORY DEFAULT_DIR LOCATION ('/user/oracle/test.txt') ) 
        REJECT LIMIT UNLIMITED;  
      
    2. Select * from bds_test;

    3. select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic#;

  6. Define a Hive table:

    1. Connect to Hive via Hue, the Hive/Beeline command line, or using Oracle SQL Developer with a Hive JDBC driver.

    2. CREATE TABLE bds_test_hive (line string);

    3. LOAD DATA INPATH '/user/oracle/test.txt' OVERWRITE INTO TABLE bds_test_hive;

  7. Define an external ORACLE_HIVE table:

    CREATE TABLE bds_test_hive (line VARCHAR2(4000))   
        ORGANIZATION EXTERNAL
        ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR
          ACCESS PARAMETERS
            (com.oracle.bigdata.tablename=default.bds_test_hive)  
        )
      REJECT LIMIT UNLIMITED;  
    

C.3 Oracle Big Data SQL Database Objects

Table C-3 Database Objects in Big Data SQL

Type Object

Directories

  • DEFAULT_DIR – points to $ORACLE_HOME/bigdatasql/default_dir.

  • ORACLE_BIGDATA_CONFIG – points to $ORACLE_HOME/bigdatasql/bigdata_config.

  • ORA_BIGDATA_CL_<hcluster> – expected to have a null value for its path.

    This is a way of limiting access. There always must be a directory object associated with an external table. Because the directory object is used for privilege checking, this is a requirement even for Hive/HDFS, where the files do not reside under the directory.

Database Links (public)

These allow Big Data SQL to reach the MTA (multi-threaded agent)

  • BDSQL$_DEFAULT_CLUSTER – the connect string’s SID should equal bds_<dbname>_<hcluster> . And the <hcluster> should be the default cluster (as defined by bigdata.cluster.default) in $ORACLE_HOME/bigdatasql/bigdata_config /bigdata.properties.

  • BDSQL$_<hcluster> - the connect string’s SID should equal bds_<dbname>_<hcluster>.

Data Dictionary Views

  • User_hive_tables, all_hive_tables, dba_hive_tables – queries all Hive tables for all Hive databases for all Hadoop clusters.

  • User_hive_databases, all_hive_databases, dba_hive_databases – queries all Hive databases for all Hadoop clusters.

  • User_hive_columns, all_hive_columns, dba_hive_columns – queries all hHive tables for all Hive databases for all Hadoop clusters.

  • V$cell – the Oracle Big Data SQL server processes running on datanodes will appear here (if properly detected by the diskmon).

Functions and Procedures for Hive Data Dictionary

See cathive.sql, dbmshadp.sql.

  • DBMS_HADOOP

    • Create_extddl_for_hive()

  • GetHiveTable – pipeline function that returns data back from the extproc external procedure. Used by the *_hive_[tables/databases/columns] views and DBMS_HADOOP.

    • HiveMetadata – ODCI framework defining the external procedureGetHiveTable.

    • SYS.DBMSHADOOPLIB (libkubsagt12.so) – C library for the external procedure.

    • HiveMetadata.jar – java library called by libkubsagt12.so.

Tables

SYS.HIVE_URI$ – security table for non-DBA users.

Statistics

  • All statistics have %XT% in the name:

    • cell XT granules requested for predicate offload

    • cell XT granule bytes requested for predicate offload

    • cell interconnect bytes returned by XT smart scan

    • cell XT granule predicate offload retries

    • cell XT granule IO bytes saved by storage index

  • Use this query:

    select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, v$mystat s
    where n.name like '%XT%' and s.statistic# = n.statistic# ;
    

    If needed, also use: grant select any catalog to <dbuser>;

C.4 Other Database-Side Artifacts

Table C-4 $ORACLE_HOME/bigdatasql Directory

Subdirectory or Filename Description of Contents

bigdata_config directory

  • bigdata.properties – Defines the location of JAR files. If your Hive tables use non-standard Java libraries, you may need to copy those libraries to the database and update the classpath entries in this file. Also defines the Hadoop cluster list and default cluster. Restart the extprocbds_<dbname>_<hcluster> after changing.

  • bigdata-log4j.properties - This controls the logging of the Java pieces, such as the metadata discovery phase of querying external tables or the query fetch phase if the cells are unavailable. Change log4j.logger.oracle.hadoop.sql to INFO to log more. Restart the extprocbds_<dbname>_<hcluster> after changing.

  • <hcluster> directory – contains the client configuration files (like hive-site.xml) copied from the Hadoop cluster.

default_dir directory

This directory is usually empty.

log directory

Contains Java log files from the two type of extprocs. Use the PID that is part of the filename to identify which extproc you are looking at (only one log file will have the PID of the currently running extprocbds_ <dbname>_<hcluster> process).

hadoop_<hcluster>.env

You can source this file and then run hadoop fs commands to quickly test Hadoop connectivity

Table C-5 External Procedure Agents

Agent Description

extproc

Runs the external procedure code used by the *_hive_tables, *_hive_databases views and the DBMS_HADOOP procedure.

$ORACLE_HOME/hs/admin/extproc.ora configures the extproc. You can add TRACE_LEVEL=ON to get more trace (but may need to first comment outSET EXTPROC_DLLS= to fix the “Error in assignment statement” message). The C portion’s log files are in $ORACLE_HOME/hs/log/orcl_agt* , but these are usually not interesting for diagnostic purposes. The JVM portion’s log files are written to $ORACLE_HOME/bigdatasql/log (but you need to set up bigdata-log4j.properties first).

extprocbds_<dbname>_<hcluster>

The BDS Multi-threaded Agent that is used when querying external tables. This is started/stopped by Oracle Clusterware which in turn runs the mtactl utility. This is registered to Oracle Clusterware when bds_exa_install.sh runs on the last database server node.

If you don’t have this extprocbds_<dbname>_<hcluster> running, then you probably didn’t run bds_exa_install.sh on every database server in your RAC cluster. The C portion’s log files are in $ORACLE_HOME/hs/log (but you need to edit $ORACLE_HOME/hs/admin/initbds – add TRACE_LEVEL=ON and then restart to see logging). The JVM portion’s log files go into $ORACLE_HOME/bigdatasql/log (but you need to setup bigdata-log4j.properties and restart). This is the recommended way to restart (although the quicker way is to run kill -9 on the process):
$ crsctl stop resource bds_<dbname>_<hcluster>
$ crsctl start resource bds_<dbname>_<hcluster>

Note:

Both of External Procedures in the table above make callbacks to the database, which can be blocked by the use of the Secure External Password Store feature. If you use Secure External Password Store (SQLNET.WALLET_OVERRIDE=TRUE), see Document 2126903.1 in My Oracle Support.

Table C-6 Log Files and Trace Files

Directory or Filename Description

$ORACLE_HOME/bigdatasql/log

Contains log files from Java code run via the extprocbds_<dbname>_<hcluster> – one shared file with PID equal to the extprocbds_<dbname>_<hcluster> PID and extproc (one per session if the session uses *_hive_tables or DBMS_HADOOP). Tip: This is good diagnostic information.

$ORACLE_HOME/hs/log

Contains log files from the C code of the extproc processes (one per session) and the multi-threadedextbds_<dbname>_<hcluster> process. The exproc is usually not interesting for diagnostic purposes. The extprocbds_* has a bit more interesting information (but you need to set TRACE_LEVEL=ON in initbds_*.ora).

Database diag directory

Contains log files from the database session. These can yield good information.

  • To identify the exact database session log file location:

    select value from v$diag_info WHERE name = 'Default Trace File';
    
  • To turn on external table logging:

    alter session set "_xt_trace"="low","compilation","execution"; 
    
  • To turn on additional logging:

    alter session set events 'trace[KCFIS] disk high, memory high';
    

/u01/oracle/diag/crs/<hostname>/crs/trace/diskmon.trc

Contains diskmon logs and errors. In a commodity Oracle Database server to commodity Hadoop environment (support in Oracle Big Data SQL 3.0 and greater), check this trace file for communication errors or fencing (ENTITY_FENCED). Restart diskmon if needed (use crsctl). In a commodity-to-commodity environment, you can simply kill the diskmon process, but do not do that in an Oracle Exadata Database Machine environment.

If you want to get additional diskmon tracing, you can set environment parameters before you invoke the crsctl command to start the cluster. Since the cluster is likely already running, you’ll first have to shut the cluster down. Then, set the environment and then start it back up. Here is how you do it in the Oracle Big Data SQL 3.x commodity database server scenario using Oracle Restart. (Note that the crsctl commands will be different if using RAC, ASM, and/or Exadata):

crsctl stop has export CELLCLIENT_TRACE_LEVEL="all,4"
export CELLCLIENT_AUTOFLUSH_LEVEL="all,4"
crsctl start has

/etc/oracle/cell/network-config/cellinit.ora

/etc/oracle/cell/network-config/celliniteth.ora

Record the IP address and subnet range for the database server. For Oracle Big Data SQL on commodity servers, this file also includes parameters which switch the protocol away from InfiniBand RDS to TCP/UDP (_skgxp_dynamic_protocol=2). On commodity servers , the database server’s diskmon (running out of the Oracle Grid home) communicates with the BDS processes on the data nodes listening on TCP port 5042.

Kerberos files: kinit, klist, etc/krb5.conf, krb5-workstation*.rpm

If your Hadoop cluster uses Kerberos, you’ll need to setup Kerberos on the database and have a mechanism (such as crontab) to keep a valid Kerberos ticket at all times for the oracle Linux user. You will need a similar ticket renewal mechanism on the BDS datanodes as well.

C.5 Hadoop Datanode Artifacts

The table below identifies objects on the Hadoop server that can provide helpful information for troubleshooting Big Data SQL.

Table C-7 Hadoop-side Datanode Artifacts That are Useful for Troubleshooting

Datanode Artifact Description

bdscli command

  • List quarantine detail

  • Drop quarantine all

  • List alerthistory

  • Drop alerthistory

  • List bdsql detail

Log files

  • /var/log/bigdatasql/DM – installer log files

  • /var/log/bigdatasql/cloudera or /var/log/bigdatasql/ambari – Ambari or CM service log files.

  • /opt/oracle/bigdatasql/bdcell-12.1/bigdata.properties

  • /opt/oracle/bigdatasql/bdcell-12.1/bigdata-log4j.properties

    • This defaults to logging off. Change tolog4j.logger.oracle.hadoop.sql=INFO and restart.

  • /opt/oracle/bigdatasql/bdcell-12.1/log directory

    • bigdata-log4j.log – logs entries from the JVM pieces of Big Data SQL (logging defaults to off, so edit bigdata-log4j.properties first and restart). This can be particularly useful information.

  • /var/log/oracle/diag/bdsql/cell/<hostname>/trace/ – general cell trace files for the Management Server, Restart Server, and Monitor Server. The alert.log file will have details about quarantine and de-quarantine events.

  • /var/log/oracle/diag/bdsql/cell/SYS_*/trace/ – Oracle Big Data SQL offload server trace files for the C portion. These are not useful for troubleshooting in most cases.

Other datanode artifacts

/opt/oracle/cell/cellsrv/deploy/config/cellinit.ora – records the cell’s IP address.

C.6 Step-by-Step Process for Querying an External Table

  1. User issues a SELECT query involving an Oracle Big Data SQL external table.

  2. Database sees that one of the objects in the SQL is an External table of type ORACLE_HIVE

  3. Database identifies the cluster name from the com.oracle.bigdata.cluster parameter on the External table definition else uses the default cluster.

  4. Database identifies the Hive table name from the com.oracle.bigdata.tablename parameter, else assumes the Hive table name is the same as the Oracle table name.

  5. Database knows that the ORACLE_HIVE External table implementation uses an external procedure which is invoked through the extprocbds_<dbname>_<hcluster> multi-threaded agent.

    Note:

    The first phase of the query requires getting the Hive metadata. If you get an error during this first phase, you’ll likely see an error that begins as follows. Notice the “OPEN” in ODCIEXTTABLEOPEN)

    ORA-29913: error in executing ODCIEXTTABLEOPEN callout

  6. Database uses the public database link BDSQL$_DEFAULT_CLUSTER or BDSQL$_<hcluster> to find the connect string to ask the listener to connect the database session to a thread of the extprocbds_dbname>_hcluster> multi-threaded agent

    1. extprocbds_<dbname>_<hcluster> was previously started by Oracle Clusterware and is using configuration information from the $ORACLE_HOME/bigdatasql/bigdata_config directory.

    2. extprocbds_<dbname>_<hcluster> has spawned a JVM running Hadoop client libraries using the above configuration information. The Hadoop client libraries were copied from the Oracle Big Data Appliance to the Oracle Database server when you ran the bds-exa-install.sh script.

  7. extprocbds_<dbname>_<hcluster> uses its JVM and the Hive metastore client library to call the Hive metastore (using a URL such as thrift://hostname>:9083) to get metadata (columns, inputformat, serde, other table properties) for the Hive table.

    1. At this point, if the Hive metastore is protected by Kerberos authentication, the Hive client libraries running in the extprocbds JVM on the Oracle Database server will try to send the local Kerberos ticket to the Hive server. This will be the ticket owned by the oracle Linux user account who is running the database

  8. extprocbds_<dbname>_<hcluster> calls the Hive metastore to get a list of input paths that hold the data behind the Hive table.

  9. extprocbds_<dbname>_<hcluster> converts the list of input paths into a list of splits/blocks using Hadoop MapReduce libraries and logic. Then it asks the HDFS namenode for the location (including replicas) of all of the splits /blocks.

    1. Again, if HDFS is protected by Kerberos, the Kerberos ticket from the oracle Linux user account on the database will be need to be used.

    2. If compression is used, at this point the JVM might have to load specific compression Java or native libraries. If these are non-standard libraries, you will need to install them on both the Oracle Database server and the Hadoop side. For instance, LZO compression requires an additional install and configuration performed on both the database-side on the Hadoop-side.

    At this point, the “description” phase is done and the database knows the structure of the Hive table as well as the location of all of the blocks of data (including replicas). This information is also known as the metadata payload. We now begin the “fetch” phase.

  10. The database intelligent storage layer, KCFIS (Kernel Cache File Intelligent Storage), which is also used on Oracle Exadata systems, compares the hostnames of where the blocks of data are stored to a list of active BDSQL server hosts being maintained by the Grid’s diskmon process. (You can see diskmon’s list of BDSQL server hosts in V$CELL).

    Note:

    The second phase of the query requires fetching the data. If you get an error during this second phase, you’ll likely see an error that begins as folllows. Notice the “FETCH” in ODCIEXTTABLEFETCH) :
    ORA-29913: error in executing ODCIEXTTABLEFETCH callout
    
  11. Assuming that the list of datanode hostnames matches the list of BDSQL hostnames, the database sends a list of local blocks (also called Granules) to each of the BDSQL servers. The database also sends the BDSQL servers metadata about the table, columns, and structure it is accessing. It does this in parallel and asynchronously for performance

    Note:

    The database statistics “cell XT granules requested for predicate offload” and “cell XT granule bytes requested for predicate offload” are updated at this point
  12. The BDSQL process running on the data nodes checks the SQL_ID against its local list of quarantined SQL_IDs. If the SQL_ID matches the quarantine list, then the BDSQL process on the datanode will return an error. However, the user should not see this error. Instead, the database will first try another cell, then try to do the work itself. (See Steps 15 and 16).

  13. Assuming the SQL_ID is not quarantined by the BDSQL process on the datanode, the BDSQL process will do its SmartScan work against the list of blocks/granules sent to it.

    Tip:

    See the blog entry Big Data SQL Quick Start. Storage Indexes - Part10 in The Data Warehouse Insider for details about Storage Indexes and other aspects of SmartScan processing.
    1. The BDSQL offload process has previously read its configuration information from /opt/oracle/bigdatasql/bdcell-12.1/bigdata.properties.

    2. The BDSQL process has previously loaded a JVM based on the properties defined in the above configuration.

    3. If the Hive table has special InputFormat or Serde classes, the JVM will load those classes assuming it can find them based on the classpath defined in the above configuration. For some common InputFormats (such as delimited text), Oracle has written C code that can handle those formats faster than regular Java code.

    4. If Kerberos authentication is used, then the BDSQL’s JVM will send its local Kerberos ticket to the HDFS datanode process. This is the Kerberos ticket associated with the oracle Linux user on the datanode where BDSQL is running.

    5. If Sentry authorization is used, the oracle Linux user’s Kerberos ticket’s identity needs to have been granted access to the Hive table and underlying HDFS data.

    6. The BDSQL server will update statistics like “cell XT granule IO bytes saved by StorageIndex” as it runs.

  14. The database kcfis layer will collect results as they are returned from the BDSQL processes on the datanodes and send the next batch of blocks/granules to the BDSQL processes.

    1. The database will update the “cell interconnect bytes returned by XT smart scan” statistic as bytes are returned

  15. If there are issues with a BDSQL process for a given block, the database will try to send the work to a different BDSQL process (it will pick a location that has a replica of the block that failed).

    1. The database will update the “cell XT granule predicate offload retries” statistic.

  16. If the database is unable to get the BDSQL processes to successfully offload a block even after retrying, then the database will “fallback” and have the JVM in the extprocbds_<db>_<cluster> do the work.

    1. This will be slower as the raw data will need to be moved to the database server for processing.

    2. If the Hive table has any special InputFormats or Serdes, the extprocbds_<db>_<cluster>’s JVM will need to load them based on the classpath configuration defined on the database’s bigdata.properties file.

  17. The results from the external table data source continue to be collected until all input paths/blocks/granules are handled.

C.7 Step-by-Step for a Hive Data Dictionary Query

  1. User queries one of the Oracle Big Data SQL data dictionary views, such as all_hive_tables.

    In Oracle Big Data SQL 2.0 and earlier, if this was the user_hive_* view and the user was not a DBA, then the user needed to be listed in the SYS.HIVE_URI$ table. Oracle Big Data SQL 3.0 removed the HIVE_URI$ check.

  2. The view accesses the GetHiveTable pl/sql pipelined table function.

  3. The GetHiveTable function is implemented by the HiveMetadata type which is implemented as an external procedure using the SYS.DBMSHADOOPLIB library.

  4. The Oracle Database spawns a new instance of the “extproc” for this database session. The extproc reads the $ORACLE_HOME/hs/admin/extproc.ora file for settings.

    You can set TRACE_LEVEL=ON for tracing of the C code. Log file will be written to $ORACLE_HOME/hs/log.

    By default, there may be an error in the extproc.ora, causing an “Error in assignment statement” message in the log . The statement “SET EXTPROC_DLLS=” (with no value after the equal sign) is not valid. Comment this line out if you want to use TRACE_LEVEL=ON .

  5. The extproc attaches the libkubsagt.so library (as in SYS.DBMSHADOOPLIB).

  6. Libkubsagt12.so initiates a JVM and loads the HiveMetadata.jar.

    1. The JVM uses the configuration information in $ORACLE_HOME/bigdatasql/bigdata_config/ to identify the list of clusters and their Hive metastore connection information.

    2. Logging for the JVM is based on $ORACLE_HOME/bigdatasql/bigdata_config/bigdata-log4j.properties. Log files will be written to $ORACLE_HOME/bigdatasql/log . There will be a new log file for each database session.

  7. The Java code in HiveMetadata.jar uses the Hive metastore client libraries to connect to the Hive metastore to retrieve data about all of the databases and all of the tables.

    1. If the Hive metastore is protected by Kerberos, the JVM will try to send the Kerberos ticket of the oracle Linux user who is running the database

  8. The Java code returns the request data back to the database.

C.8 Key Adminstration Tasks for Oracle Big Data SQL

  • Restarting the extprocbds_<db>_<hcluster>:

    $  crsctl stop res bds_<dbname>_<hcluster>
    

    Quick way, but not the best way: kill the extprocbds_* process and wait for it to come back

  • Restarting the extproc.

    This begins a new database session.

  • Restarting the Oracle Big Data SQL software on the datanodes:

    • Use Cloudera Manager or the Ambari Web UI.

    • Quick way, but not the best way: kill the bdsqloflsrv process and wait for it to come back.

    • Command line method on an Oracle Big Data Appliance (logged on as root on node1):

      $ bdacli stop big_data_sql_cluster
      $ bdacli start big_data_sql_cluster
      
  • Checking for Oracle Big Data SQL quarantines on a single datanode:

    $ bdscli -e list quarantine detail 
    

    To check for quarantines on all datanodes:

    $ dcli -g cells.lst bdscli -e list quarantine detail
    
  • Clearing Oracle Big Data SQL quarantines on a single datanode:

     $ bdscli -e drop quarantine all
    

    To clear quarantines on all datanodes:

    $ dcli -g cells.lst bdscli -e drop quarantine all
  • Checking statistics for proper offloading:

    • Use the Sql Monitor hint: /*+ MONITOR*/.

    • Query XT statistics. Ensure that “retries” is zero and “bytes returned” is greater than zero.

  • Looking for log files on the datanodes:

    1. Clear quarantines on all datanodes

    2. Set Log property in /opt/oracle/bigdatasql/bdcell-12.1/bigdata-log4j.properties on datanodes.

    3. Restart bdsqloflsrv on datanodes.

    4. Cd to the log file directory: /opt/oracle/bigdatasql/bdcell-12.1/log.

    5. tail -f bigdata-log4j.log

    6. Ensure that your query has data on the node you are looking at (i.e. your query should need to access files with many blocks. If you only touch a small number of blocks, the result may be that your datanode is not be asked to do any work)

    7. Make a new database session (to reset XT statistics) and Run query.

      Tip:

      Use the /*+MONITOR*/ hint if you want to be sure to see it in SQL Monitor.

      You should see new entries in the datanode’s bigdata-log4j.log.

    8. On the Oracle Database server, query XT statistics and check that retries=0 and bytes returned>0.

  • Looking for log files on the database:

    1. Clear quarantines on all data nodes

    2. Make a new database session (to reset XT statistics)

    3. Find out what instance your session is connected to (in case you got load-balanced to a different database server than the one you logged on to):
      select host_name from v$instance;
      
    4. Log in to that instance’s database server at the Linux level.

    5. Set log properties in $ORACLE_HOME/bigdatasql/bigdata-log4j.properties.

    6. Restart extprocbds_<db>_<hcluster> on that instance to pick up the log property changes

    7. Turn on XT tracing:

      This command turns on external table logging:

      alter session set "_xt_trace"="low","compilation","execution";
      

      This command adds additional tracing:

      alter session set events 'trace[KCFIS] disk high, memory high';
      
    8. Run the query.

      Tip:

      Use the /*+ MONITOR */ hint if you want to be sure to see it in SQL Monitor.
    9. Query XT statistics and see if retries=0 and bytes returned>0.

      select n.name, s.value /* , s.inst_id, s.sid */ from v$statname n, gv$mystat s where n.name like '%XT%' and s.statistic# = n.statistic# ;
      
    10. Look at JVM log file: $ORACLE_HOME/bigdatasql. (Look for the one with the same PID as the extprocbds_* process.)

    11. Look at database trace_file:

       select value from v$diag_info WHERE name = 'Default Trace File';
      

C.9 Additional Java Diagnostics

  • You can add JVM properties to the bigdata.properties file as shown below. This can be good for hard-to-spot low-level Kerberos issues.

    java.options=-Dsun.security.krb5.debug=true
    
  • The extproc and extprocbds_<dbname>_<hcluster> processes run the JVMs on the database and the bdsqloflsrv process runs the JVM on the datanode. You can see this by running the “jps” command:

    $ORACLE_HOME/bigdatasql/jdk*/bin/jps
    
  • If you are very comfortable with your Java skills, you can also use Oracle JVisualVM or Oracle JConsole to connect to the JVMs.

C.10 Checking for Correct Oracle Big Data SQL Patches

Patch and Datapatch errors can have a number of different causes and effects. One thing you can do is check to ensure that the expected patches are loaded.

If you see "wrong number or types of arguments in a call to 'FETCH_OPEN' in the error stack

Here is an example of an error stack that may warrant a query of DBA_REGISTRY_SQLPATCH to determine if the correct patches are loaded:

ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
ORA-06550: line 1, column 25:
PLS-00306: wrong number or types of arguments in call to 'FETCH_OPEN'
ORA-06550: line 1, column 14:PL/SQL: Statement ignored

This may indicate one of several problems.

  • A Bundle Patch was not applied correctly

  • Datapatch did not run and therefore a patch required by the installed version of Oracle Big Data SQL is not installed

In this case, run the following query to determine if patches identified as requirements for this installation in theOracle Big Data SQL Master Compatibility Matrix (Doc ID 2119369.1 in My Oracle Support) have been applied.

select PATCH_ID, PATCH_UID, VERSION, STATUS, DESCRIPTION from DBA_REGISTRY_SQLPATCH order by BUNDLE_SERIES

For example, for Oracle Big Data SQL 3.0.1 with BP  12.1.0.2.160419 (22806133),  the query should return these results.

 PATCH_ID    PATCH_UID   VERSION              STATUS           DESCRIPTION                                                                                        
----------  ----------  -------  ----  ------- --  ------------  
22806133     19983161    12.1.0.2             SUCCESS          DATABASE BUNDLE PATCH: 12.1.0.2.160419 (22806133)                                                  

If the query fails or the correct patch for the installed bundle is not found, see 1609718.1 in My Oracle Support for more information about issues with Datapatch.