Using Support Bundles

Support bundles are files of diagnostic data collected from the Private Cloud Appliance that are used to evaluate and fix problems.

Support bundles can be uploaded to Oracle Support automatically or manually. Support bundles are uploaded securely and contain the minimum required data: system identity (not IP addresses), problem symptoms, and diagnostic information such as logs and status.

Support bundles can be created and not uploaded. You might want to create a bundle for your own use. Creating a support bundle is a convenient way to collect related data.

Support bundles are created and uploaded in the following ways:

Oracle Auto Service Request (ASR)

ASR automatically creates a service request and support bundle when certain hardware faults occur. The service request and support bundle are automatically sent to Oracle Support, and the Private Cloud Appliance administrator is notified. See Using Oracle Auto Service Request.

asrInitiateBundle

The asrInitiateBundle command is a PCA-ADMIN command that creates a support bundle, attaches the support bundle to an existing service request, and uploads to Oracle Support. See Using the asrInitiateBundle Command.

support-bundles

The support-bundles command is a management node command that creates a support bundle of a specified type. Oracle Support might ask you to run this command to collect more data related to a service request, or you might want to collect this data for your own use. See Using the support-bundles Command.

Manual upload to Oracle Support

Several methods are available for uploading support bundles or other data to Oracle Support. See Uploading Support Bundles to Oracle Support.

Using the asrInitiateBundle Command

The asrInitiateBundle command takes three parameters, all required:

PCA-ADMIN> asrInitiateBundle mode=triage sr=SR_number bundleType=auto

A triage support bundle is collected and automatically attached to service request SR_number. For more information about the triage support bundle, see Triage Mode.

If the ASR service is enabled, bundleType=auto uploads the bundle to Oracle Support using the Phone Home service. For information about the Phone Home service, see Registering Private Cloud Appliance for Oracle Auto Service Request.

Using the support-bundles Command

The support-bundles command collects various types of bundles, or modes, of diagnostic data such as health check status, command outputs, and logs. This topic describes the available modes. The following is the recommended way to use this command:

  1. Start data collection by specifying triage mode to understand the preliminary status of the Private Cloud Appliance.

  2. If NOT_HEALTHY appears in the triage mode results, then do one of the following:

    • Use time_slice mode to collect data by time slots. These results can be further narrowed by specifying pod name, job, and k8s_app label.

    • Use smart mode to query data from specific health-checkers.

The support-bundles command requires a mode (-m) option. Some modes have additional options.

The following table lists the options that are common to all modes of the support-bundles command.

Option Description Required

-m mode

The type of bundle.

yes

-sr SR_number

--sr_number SR_number

The service request number.

no

For most modes, the support-bundles command produces a single archive file. The output archive file is named [SR_number_]pca-support-bundle.current-time.tgz. The SR_number is used if you provided the -sr option. If you are creating the support bundle for a service request, you should specify the SR_number.

For native mode, the support-bundles command produces a directory of archive files.

The archive files are stored in /nfs/shared_storage/support_bundles/ on the management node.

Log in to the Management Node

To use the support-bundles command, log in as root to the management node that is running Pacemaker resources. Collect data first from the management node that is running Pacemaker resources, then from other management nodes as needed.

If you do not know which management node is running Pacemaker resources, log in to any management node and check Pacemaker cluster status. The following command shows the Pacemaker cluster resources are running on pcamn01.

[root@pcamn01 ~]# pcs status
Cluster name: mncluster
Stack: corosync
Current DC: pcamn01
...
Full list of resources:

scsi_fencing (stonith:fence_scsi): Stopped (disabled)
Resource Group: mgmt-rg
vip-mgmt-int (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-host (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ilom (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-lb (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ext (ocf::heartbeat:IPaddr2): Started pcamn01
l1api (systemd:l1api): Started pcamn01
haproxy (ocf::heartbeat:haproxy): Started pcamn01
pca-node-state (systemd:pca_node_state): Started pcamn01
dhcp (ocf::heartbeat:dhcpd): Started pcamn01
hw-monitor (systemd:hw_monitor): Started pcamn01

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Triage Mode

In triage mode, Prometheus platform_health_check is queried for both HEALTHY and NOT_HEALTHY status. If NOT_HEALTHY is found, use time_slice mode to get more detail.

[root@pcamn01 ~]# support-bundles -m triage

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

compute_node_info.json

Pods running in the compute node.

management_node_info.json

Pods running in the management node.

rack_info.json

Rack installation time and build version.

loki_search_results.log.n

Chunk files in json.

Time Slice Mode

In time slice mode, data is collected by specifying start and end timestamps.

If you do not specify either the -j or --all option, then data is collected from all health checker jobs.

You can narrow the data collection by specifying any of the following:

  • Loki job label

  • Loki k8s_app label

  • Pod name

[root@pcamn01 ~]# support-bundles -m time_slice -j flannel-checker -s 2021-05-29T22:40:00.000Z \
-e 2021-06-29T22:40:00.000Z -l INFO

See more examples below.

The time slice mode of the support-bundles command has the following options in addition to the mode and service request number options listed at the beginning of this topic.

  • Only one of --job_name, --all, and --k8s_app an be specified.

  • If none of --job_name, --all, or --k8s_app is specified, the pod filtering will occur on the default (.+checker).

  • The --all option can collect a huge amount of data. You might want to limit your time slice to 48 hours.

Option Description Required

-j job_name

--job_name job_name

Loki job name. Default value: .+checker

See Label List Query below.

no

--all Queries all job names except for jobs known for too much logging, such as audit, kubernetes-audit, and vault-audit and k8s_app label pcacoredns. no
--k8s_app label

The k8s_app label value to query within the k8s-stdout-logs job.

See Label List Query below.

no

-l level

--levelname level

Message level

no

-s timestamp

--start_date timestamp

Start date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

yes

-e timestamp

--end_date timestamp

End date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

yes

--pod_name pod_name The pod name (such as kube or network-checker) to filter output based on the pod. Only the starting letters are necessary. no

Label List Query

Use the label list query to list the available job names and k8s_app label values.

[root@pcamn01 ~]# support-bundles -m label_list
2021-10-14T23:19:18.265 - support_bundles - INFO - Starting Support Bundles
2021-10-14T23:19:18.317 - support_bundles - INFO - Locating filter-logs Pod
2021-10-14T23:19:18.344 - support_bundles - INFO - Executing command - ['python3', 
'/usr/lib/python3.6/site-packages/filter_logs/label_list.py']
2021-10-14T23:19:18.666 - support_bundles - INFO -
Label:  job
Values: ['admin', 'api-server', 'asr-client', 'asrclient-checker', 'audit', 'cert-checker', 'ceui', 
'compute', 'corosync', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 'flannel-checker', 
'his', 'hms', 'iam', 'k8s-stdout-logs', 'kubelet', 'kubernetes-audit', 'kubernetes-checker', 
'l0-cluster-services-checker', 'messages', 'mysql-cluster-checker', 'network-checker', 'ovm-agent', 
'ovn-controller', 'ovs-vswitchd', 'ovsdb-server', 'pca-healthchecker', 'pca-nwctl', 'pca-platform-l0', 
'pca-platform-l1api', 'pca-upgrader', 'pcsd', 'registry-checker', 'sauron-checker', 'secure', 
'storagectl', 'uws', 'vault', 'vault-audit', 'vault-checker', 'zfssa-checker', 'zfssa-log-exporter']
 
Label:  k8s_app
Values: ['admin', 'api', 'asr-client', 'asrclient-checker', 'brs', 'cert-checker', 'compute', 
'default-http-backend', 'dr-admin', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 
'flannel-checker', 'fluentd', 'ha-cluster-exporter', 'has', 'his', 'hms', 'iam', 'ilom', 
'kube-apiserver', 'kube-controller-manager', 'kube-proxy', 'kubernetes-checker', '
l0-cluster-services-checker', 'loki', 'loki-bnr', 'mysql-cluster-checker', 'mysqld-exporter', 
'network-checker', 'pcacoredns', 'pcadnsmgr', 'pcanetwork', 'pcaswitchmgr', 'prometheus', 'rabbitmq', 
'registry-checker', 'sauron-api', 'sauron-checker', 'sauron-grafana', 'sauron-ingress-controller', 
'sauron-mandos', 'sauron-operator', 'sauron-prometheus', 'sauron-prometheus-gw', 
'sauron-sauron-exporter', 'sauron.oracledx.com', 'storagectl', 'switch-metric', 'uws', 'vault-checker', 
'vmconsole', 'zfssa-analytics-exporter', 'zfssa-csi-nodeplugin', 'zfssa-csi-provisioner', 'zfssa-log-exporter']

Examples:

No job label, no k8s_app label, collect log from all health checkers.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

One job ceui.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -j ceui -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

One k8s_app network-checker.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --k8s_app network-checker -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

All jobs and date.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s `date -d "2 days ago" -u +"%Y-%m-%dT%H:%M:%S.000Z"` -e `date -d +u +"%Y-%m-%dT%H:%M:%S.000Z"`

All jobs.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --all -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

loki_search_results.log.n

Chunk files in json.

Smart Mode

In smart mode, health checkers are queried for recent NOT_HEALTHY status. By default, two days of logs are collected. If you need more than two days of logs, specify the --force option. Use the -hc option to specify a health checker.

[root@pcamn01 ~]# support-bundles -m smart

See more examples below.

The smart mode of the support-bundles command has the following options in addition to the mode and service request number options listed at the beginning of this topic.

If only the start date or only the end date is given, the time is calculated and queried two days prior to the given end date or two days after the given start date. If only the start date is given and under the two day time range, the default most recent unhealthy time is used.

Option Description Required

-hc health_checker_name

--health_checker health_checker_name

Loki health checker name.

See the health checker log files table below.

no

--errors_only Level name filtering takes place only on Error, Critical, and Severe. no
--force

Force the start date to override the two-day time range limit.

no

-s timestamp

--start_date timestamp

Start date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

Default value: End date minus 2 days

no

-e timestamp

--end_date timestamp

End date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

Default value: Most recent unhealthy time

no

The following table lists the log files for each health checker.

Health Checker Supporting Log Files

L0_hw_health-checker

  • pca.log, pca.health.log, pca.l1api.log, pacemaker.log

  • pca-platform-l1api

  • pca-healthchecker

  • pacemaker

  • pca-platform-l0

cert-checker

No logs - only certificate and expiry date (from the checker)

etcd-checker

  • etcd-container.log

flannel-checker

k8s-stdout-logs: filter by pod (flannel), node, and container

kubernetes-checker

k8s-stdout-logs: filter by pod (kube-apiserver), node, and container

l0-cluster-services-checker

  • pacemaker.log, corosync.log

  • corosync

  • pcsd

mysql-cluster-checker

  • mysqld

network-checker

  • HMS

registry-checker

messages (registry itself does not produce logs)

vault-checker

  • hc-vault-audit.log

  • hc-vault-audit.log

zfssa-checker

  • zfssa-checker

  • zfssa-log-exporter (log = alert | audit | pcalog)

Examples:

No -hc. Query unhealthy data from all health checkers.

[root@pcamn01 ~]# support-bundles -m smart -sr 3-xxxxxxxxxxx

Use -hc to specify one health checker.

[root@pcamn01 ~]# support-bundles -m smart -sr 3-xxxxxxxxxxx -hc network-checker

Timestamps with --force.

[root@pcamn01 ~]# support-bundles -m smart -sr 3-xxxxxxxxxxx -s "2022-01-11/00:00:00" -e "2022-01-15/23:59:59" --force

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

loki_search_results.log.n

Chunk files in json.

Native Mode

Unlike other support bundle modes, the native bundle command returns immediately and the bundle collection runs in the background. Native bundles might take hours to collect. Collection progress information is provided in the native_collection.log in the bundle directory.

Also unlike other support bundle modes, the output of native bundles is not a single archive file. Instead, a bundle directory is created in the /nfs/shared_storage/support_bundles/ area on the management node. The directory contains the native_collection.log file and a number of tar.gz files.

[root@pcamn01 ~]# support-bundles -m native -t bundle_type [-c component_name] [-sr SR_number]

The native mode of the support-bundles command has the following options in addition to the mode and service request number options listed at the beginning of this topic.

Option Description Required

-t bundle_type

--type bundle_type

Bundle type: sosreport or zfs-bundle

yes

-c component_name

--component component_name

Component name

This option only applies to type sosreport.

no

ZFS Bundle

When type is zfs-bundle, a ZFS support bundle collection starts on both ZFS nodes and downloads the new ZFS support bundles into the bundle directory.

[root@pcamn01 ~]# support-bundles -m native -t zfs-bundle
2021-11-16T22:49:30.982 - support_bundles - INFO - Starting Support Bundles
2021-11-16T22:49:31.037 - support_bundles - INFO - Locating filter-logs Pod
2021-11-16T22:49:31.064 - support_bundles - INFO - Executing command - ['python3', '/usr/lib/python3.6/site-packages/filter_logs/native.py', '-t', 'zfs-bundle']
2021-11-16T22:49:31.287 - support_bundles - INFO - LAUNCHING COMMAND: ['python3', '/usr/lib/python3.6/site-packages/filter_logs/native_app.py', '-t', 'zfs-bundle', '--target_directory', '/support_bundles/zfs-bundle_20211116T224931267']
ZFS native bundle collection running to /nfs/shared_storage/support_bundles/zfs-bundle_20211116T224931267
Monitor /nfs/shared_storage/support_bundles/zfs-bundle_20211116T224931267/native_collection.log for progress.
 
2021-11-16T22:49:31.287 - support_bundles - INFO - Finished running Support Bundles

SOS Report Bundle

When type is sosreport, the component_name is a management node or compute node. If component_name is not specified, the report is collected from all management and compute nodes.

[root@pcamn01 ~]# support-bundles -m native -t sosreport -c pcacn003 -sr SR_number

Uploading Support Bundles to Oracle Support

After you create a support bundle using the support-bundles command as described in Using the support-bundles Command, you can use the methods described in this topic to upload the support bundle to Oracle Support.

To use these methods, you must satisfy the following requirements:

  • You must have a My Oracle Support user ID with Create and Update SR permissions granted by the appropriate Customer User Administrator (CUA) for each Support Identifier (SI) being used to upload files.

  • For file uploads to existing service requests, the Support Identifier associated with the service request must be in your profile.

  • To upload files larger than 2 GB, sending machines must have network access to connect to the My Oracle Support servers at transport.oracle.com to use FTPS and HTTPS.

    The Oracle FTPS service is a "passive" implementation. With an implicit configuration, the initial connection is from the client to the service on a control port of 990 and the connection is then switched to a high port to exchange data. Oracle defines a possible range of the data port of 32000-42000, and depending upon your network configuration you may need to enable outbound connections on both port 990 and 32000-42000. TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 is the only encryption method enabled.

    The Oracle HTTPS diagnostic upload service uses the standard HTTPS port of 443 and does not require any additional ports to be opened.

    When using command line protocols, do not include your password in the command. Enter your password only when prompted.

  • Oracle requires the use of TLS 1.2+ for all file transfers.

  • Do not upload encrypted or password-protected files, standalone or within an archive. A Service Request update will note this as a corrupted file or reject the upload as disallowed file types were found. Files are encrypted when you use FTPS and HTTPS; additional protections are not required.

  • Do not upload files with file type extensions exe, bat, asp, or com, either standalone or within an archive. A Service Request update will note that a disallowed file type was found.

Uploading Files 2 GB or Smaller

Use the SR file upload utility on the My Oracle Support Portal.

  1. Log in to My Oracle Support with your My Oracle Support user name and password.

  2. Do one of the following:

    • Create a new service request and in the next step, select the Upload button.

    • Select and open an existing service request.

  3. Click the Add Attachment button located at the top of the page.

  4. Click the Choose File button.

  5. Navigate and select the file to upload.

  6. Click the Attach File button.

You can also use the methods described in the next section for larger files.

Uploading Files Larger Than 2 GB

You cannot upload a file larger than 200 GB. See Splitting Files.

FTPS

Syntax:

Be sure to include the / character after the service request number.

$ curl -T path_and_filename -u MOS_user_ID ftps://transport.oracle.com/issue/SR_number/

Example:

$ curl -T /u02/files/bigfile.tar -u MOSuserID@example.com ftps://transport.oracle.com/issue/3-1234567890/

HTTPS

Syntax:

Be sure to include the / character after the service request number.

$ curl -T path_and_filename -u MOS_user_ID https://transport.oracle.com/upload/issue/SR_number/

Example:

$ curl -T D:\data\bigfile.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/

Renaming the file during send

$ curl -T D:\data\bigfile.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/NotSoBig.tar

Using a proxy

$ curl -k -T D:\data\bigfile.tar -x proxy.example.com:80 -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/

Splitting Files

You can split a large file into multiple parts and upload the parts. Oracle Transport will concatenate the segments when you complete uploading all the parts.

Only HTTPS protocol can be used. Only the UNIX split utility can be used. The Microsoft Windows split utility produces an incompatible format.

To reduce upload times, compress the original file prior to splitting.

  1. Split the file.

    The following command splits the file file1.tar into 2 GB parts named file1.tar.partaa and file1.tar.partab.

    Important:

    Specify the .part extension exactly as shown below.

    $ split –b 2048m file1.tar file1.tar.part
  2. Upload the resulting file1.tar.partaa and file1.tar.partab files.

    Important:

    Do not rename these output part files.

    $ curl -T file1.tar.partaa -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/
    $ curl -T file1.tar.partab -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/
  3. Send the command to put the parts back together.

    The spit files will not be attached to the service request. Only the final concatenated file will be attached to the service request.

    $ curl -X PUT -H X-multipart-total-size:original_size -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/file1.tar?multiPartComplete=true

    In the preceding command, original_size is the size of the original unsplit file as shown by a file listing.

  4. Verify the size of the newly-attached file.

    Note:

    This verification command must be executed immediately after the concatenation command in Step 3. Otherwise, the file will have begun processing and will no longer be available for this command.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/file1.tar
        X-existing-file-size: original_size

Resuming an Interrupted HTTPS Upload

You can resume a file upload that terminated abnormally. Resuming can only be done by using HTTPS. Resuming does not work with FTPS. When an upload is interrupted by some event, the start with retrieving the file size of the interrupted file

  1. Determine how much of the file has already been uploaded.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
    HTTP/1.1 204 No Content
    Date: Tue, 15 Nov 2022 22:53:54 GMT
    Content-Type: text/plain
    X-existing-file-size: already_uploaded_size
    X-Powered-By: Servlet/3.0 JSP/2.2
  2. Resume the file upload.

    Note the file size returned in “X-existing-file-size” in Step 1. Use that file size after the -C switch and in the -H “X-resume-offset:” switch.

    $ curl -Calready_uploaded_size -H "X-resume-offset: already_uploaded_size" -T myinfo.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
  3. Verify the final file size.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
    -H X-existing-file-size: original_size

    In the preceding command, original_size is the size of the original file as shown by a file listing.