Using Support Bundles

Support bundles are files of diagnostic data collected from the Private Cloud Appliance that are used to evaluate and fix problems.

Support bundles can be uploaded to Oracle Support automatically or manually. Support bundles are uploaded securely and contain the minimum required data: system identity (not IP addresses), problem symptoms, and diagnostic information such as logs and status.

Support bundles can be created and not uploaded. You might want to create a bundle for your own use. Creating a support bundle is a convenient way to collect related data.

Support bundles are created and uploaded in the following ways:

Oracle Auto Service Request (ASR)

ASR automatically creates a service request and support bundle when certain hardware faults occur. The service request and support bundle are automatically sent to Oracle Support, and the Private Cloud Appliance administrator is notified. See Using Oracle Auto Service Request.

asrInitiateBundle

The asrInitiateBundle command is a PCA-ADMIN command that creates a support bundle, attaches the support bundle to an existing service request, and uploads the support bundle to Oracle Support. See Using the asrInitiateBundle Command.

support-bundles

The support-bundles command is a management node command that creates a support bundle of a specified type. Oracle Support might ask you to run this command to collect more data related to a service request, or you might want to collect this data for your own use. See Using the support-bundles Command.

Manual upload to Oracle Support

Several methods are available for uploading support bundles or other data to Oracle Support. See Uploading Support Bundles to Oracle Support.

Using the asrInitiateBundle Command

The asrInitiateBundle command takes three parameters, all required:

PCA-ADMIN> asrInitiateBundle mode=triage sr=SR_number bundleType=auto

A triage support bundle is collected and automatically attached to service request SR_number. For more information about the triage support bundle, see Triage Mode.

If the ASR service is enabled, bundleType=auto uploads the bundle to Oracle Support using the Phone Home service. For information about the Phone Home service, see Registering Private Cloud Appliance for Oracle Auto Service Request. The bundle is saved on the management node for two days after successful upload. See Using the support-bundles Command.

If you specify mode=native and do not specify any value for nativeType, then a ZFS_BUNDLE is uploaded.

Using the support-bundles Command

The support-bundles command collects various types of bundles, or modes, of diagnostic data such as health check status, command outputs, and logs. Depending on the options provided, these files might contain logs or status. All modes collect files into a bundle directory.

No more than one support bundle process is allowed at one time. A support bundle lock file is created at the beginning of bundle collection and removed when bundle collection is complete.

All support-bundles commands return immediately, and the bundle collection runs in the background. This is because bundle collections might take a long time, perhaps hours.

Bundles are stored for two days, then automatically deleted.

The following types of bundles are supported:

  • Triage Mode. Collects data about the current status of the Private Cloud Appliance.

  • Time Slice Mode. Collects data by time slots. These results can be further narrowed by specifying pod name, job, and k8s_app label.

  • Combo Mode. Collects a combination of triage and time slice data.

  • Native Mode. Collects data from management, compute, and ZFS nodes and from ILOM and Cisco hosts.

A good way to start to investigate an issue is to collect a combo bundle. Look for NOT_HEALTHY in the triage mode results and compare that to what you see in the time_slice mode results.

The support-bundles command requires a mode option. All modes accept the service request number option. See the following table. Time slice and native modes have additional options.

Option Description Required

-m mode

The type of bundle.

yes

-sr SR_number

--sr_number SR_number

The service request number.

no

The support-bundles command output is stored in the following directory on the management node, where bundle-type is the mode: triage, time_slice, combo, or native:

/nfs/shared_storage/support_bundles/SR_number_bundle-type-bundle_timestamp/

The SR_number is used if you provided the -sr option. If you are creating the support bundle for a service request, specify the SR_number.

This directory contains a bundle collection progress file and an archive file. The bundle collection progress file has the following name:

bundle-type_collection.log

The output archive file has the following name:

SR_number_bundle-type-bundle_timestamp.tar.gz

The archive file contains a header.json file with the following default components:

  • current-time - the timestamp

  • create-support-bundle - the command line that was used

  • sr-number - the SR number associated with the archive file

Log in to the Management Node

To use the support-bundles command, log in as root to the management node that is running Pacemaker resources. Collect data first from the management node that is running Pacemaker resources, then from other management nodes as needed.

If you do not know which management node is running Pacemaker resources, log in to any management node and check Pacemaker cluster status. The following command shows the Pacemaker cluster resources are running on pcamn01.

[root@pcamn01 ~]# pcs status
Cluster name: mncluster
Stack: corosync
Current DC: pcamn01
...
Full list of resources:

scsi_fencing (stonith:fence_scsi): Stopped (disabled)
Resource Group: mgmt-rg
vip-mgmt-int (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-host (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ilom (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-lb (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ext (ocf::heartbeat:IPaddr2): Started pcamn01
l1api (systemd:l1api): Started pcamn01
haproxy (ocf::heartbeat:haproxy): Started pcamn01
pca-node-state (systemd:pca_node_state): Started pcamn01
dhcp (ocf::heartbeat:dhcpd): Started pcamn01
hw-monitor (systemd:hw_monitor): Started pcamn01

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled

Triage Mode

In triage mode, Prometheus platform_health_check is queried for both HEALTHY and NOT_HEALTHY status. If NOT_HEALTHY is found, use time_slice mode to get more detail.

[root@pcamn01 ~]# support-bundles -m triage

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

compute_node_info.json

Pods running in the compute node.

hardware_info.json

Hardware component list retrieved from hms, all the ipmitool fru running at all the ready state management and compute nodes, all the zfssa heads information.

management_node_info.json

Pods running in the management node.

rack_info.json

Rack installation time and build version.

loki_search_results.log.n

Chunk files in json.

Time Slice Mode

In time slice mode, data is collected by specifying start and end timestamps. Both of the following options are required:

  • -s start_date

  • -e end_date

Time slice mode has the following options in addition to the mode and service request number options. These options help narrow the data collection. If you do not specify either the -j or --all option, then data is collected from all health checker jobs.

  • Only one of --job_name, --all, and --k8s_app an be specified.

  • If none of --job_name, --all, or --k8s_app is specified, the pod filtering will occur on the default (.+checker).

  • The --all option can collect a huge amount of data. You might want to limit your time slice to 48 hours.

Example:

[root@pcamn01 ~]# support-bundles -m time_slice -j flannel-checker -s 2021-05-29T22:40:00.000Z \
-e 2021-06-29T22:40:00.000Z -l INFO

See more examples below.

Option Description Required

-s timestamp

--start_date timestamp

Start date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

yes

-e timestamp

--end_date timestamp

End date in format yyyy-mmm-ddTHH:mm:ss

The minimum argument is yyyy-mmm-dd

yes

-j job_name

--job_name job_name

Loki job name. Default value: .+checker

See Label List Query below.

no

--k8s_app label

The k8s_app label value to query within the k8s-stdout-logs job.

See Label List Query below.

no
--all Queries all job names except for jobs known for too much logging, such as audit, kubernetes-audit, and vault-audit and k8s_app label pcacoredns. no

-l level

--levelname level

Message level

no

--pod_name pod_name

The pod name (such as kube or network-checker) to filter output based on the pod. Only the starting letters are necessary.

no

-t timeout

--timeout timeout

Timeout in seconds for a single Loki query. By default it is 180 seconds.

no

Label List Query

Use the label list query to list the available job names and k8s_app label values.

[root@pcamn01 ~]# support-bundles -m label_list
2021-10-14T23:19:18.265 - support_bundles - INFO - Starting Support Bundles
2021-10-14T23:19:18.317 - support_bundles - INFO - Locating filter-logs Pod
2021-10-14T23:19:18.344 - support_bundles - INFO - Executing command - ['python3', 
'/usr/lib/python3.6/site-packages/filter_logs/label_list.py']
2021-10-14T23:19:18.666 - support_bundles - INFO -
Label:  job
Values: ['admin', 'api-server', 'asr-client', 'asrclient-checker', 'audit', 'cert-checker', 'ceui', 
'compute', 'corosync', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 'flannel-checker', 
'his', 'hms', 'iam', 'k8s-stdout-logs', 'kubelet', 'kubernetes-audit', 'kubernetes-checker', 
'l0-cluster-services-checker', 'messages', 'mysql-cluster-checker', 'network-checker', 'ovm-agent', 
'ovn-controller', 'ovs-vswitchd', 'ovsdb-server', 'pca-healthchecker', 'pca-nwctl', 'pca-platform-l0', 
'pca-platform-l1api', 'pca-upgrader', 'pcsd', 'registry-checker', 'sauron-checker', 'secure', 
'storagectl', 'uws', 'vault', 'vault-audit', 'vault-checker', 'zfssa-checker', 'zfssa-log-exporter']
 
Label:  k8s_app
Values: ['admin', 'api', 'asr-client', 'asrclient-checker', 'brs', 'cert-checker', 'compute', 
'default-http-backend', 'dr-admin', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 
'flannel-checker', 'fluentd', 'ha-cluster-exporter', 'has', 'his', 'hms', 'iam', 'ilom', 
'kube-apiserver', 'kube-controller-manager', 'kube-proxy', 'kubernetes-checker', '
l0-cluster-services-checker', 'loki', 'loki-bnr', 'mysql-cluster-checker', 'mysqld-exporter', 
'network-checker', 'pcacoredns', 'pcadnsmgr', 'pcanetwork', 'pcaswitchmgr', 'prometheus', 'rabbitmq', 
'registry-checker', 'sauron-api', 'sauron-checker', 'sauron-grafana', 'sauron-ingress-controller', 
'sauron-mandos', 'sauron-operator', 'sauron-prometheus', 'sauron-prometheus-gw', 
'sauron-sauron-exporter', 'sauron.oracledx.com', 'storagectl', 'switch-metric', 'uws', 'vault-checker', 
'vmconsole', 'zfssa-analytics-exporter', 'zfssa-csi-nodeplugin', 'zfssa-csi-provisioner', 'zfssa-log-exporter']

Examples:

No job label, no k8s_app label, collect log from all health checkers.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

One job ceui.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -j ceui -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

One k8s_app network-checker.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --k8s_app network-checker -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

All jobs and date.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s `date -d "2 days ago" -u +"%Y-%m-%dT%H:%M:%S.000Z"` -e `date -d +u +"%Y-%m-%dT%H:%M:%S.000Z"`

All jobs.

[root@pcamn01 ~]# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --all -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

loki_search_results.log.n

Chunk files in json. Time slice bundles have a limit of 500,000 logs per query, from start time.

rack_info.json

Rack installation time and build version.

Combo Mode

The combo mode is a combination of a triage bundle and a time slice bundle. The output includes an archive file and two collection log files: triage_collection.log and time_slice_collection.log.

The following files are in the output archive file.

File Description

triage-bundle_timestamp.tar.gz

The triage bundle archive file.

time_slice-bundle_timestamp.tar.gz

The time slice bundle archive file.

The time slice data collected is for --all jobs from one hour preceding the current time to the current time.

Native Mode

The native_collection.log file in the bundle directory provides collection progress information. Native bundles can take hours to collect.

The native mode has the following parameters in addition to mode and SR number.

Parameter Description Required

-t nativetype

--type nativetype

  • zfs_bundle

  • sosreport

  • ilom_snapshot

  • cisco_bundle

Default value: zfs_bundle

no

-c component

--component component

Component name, such as the name of a management, compute, or ZFS node, or an ILOM or Cisco host.

no

The following files are in the output archive file.

File Description

header.json

Time stamp and command line to generate this bundle.

Native bundle files

These files are specific to the nativetype specified.

rack_info.json

Rack installation time and build version.

ZFS Bundle

When nativetype is a ZFS support bundle, collection starts on both ZFS nodes and downloads the new ZFS support bundles into the bundle directory. When nativetype is not specified, zfs_bundle is created by default.

[root@pcamn01 ~]# support-bundles -m native -t zfs_bundle

SOS Report Bundle

When nativetype is an SOS report bundle, the report is collected from the management node or compute node specified by the --component parameter. If --component is not specified, the report is collected from all management and compute nodes.

[root@pcamn01 ~]# support-bundles -m native -t sosreport -c pcamn01

ILOM Snapshot

When nativeType=ilom_snapshot, the value of the --component parameter is the ILOM host name of a management node or compute node. If the --component parameter is not specified, the report is collected from all ILOM hosts.

[root@pcamn01 ~]# support-bundles -m native -t ilom_snapshot -c ilom-pcacn007

Cisco Bundle

When nativetype is cisco-bundle, the value of the --component parameter is an internal Cisco management, aggregation, or access switch management host name.

[root@pcamn01 ~]# support-bundles -m native -t cisco-bundle -c accsn01

To create a cisco-bundle type of collection, the following conditions must be met:

  • The Cisco OBFL module must be enabled on all Private Cloud Appliance Cisco switches. The Cisco OBFL module is enabled by default on all Private Cloud Appliance Cisco switches.

  • The Cisco EEM module must be enabled on all Private Cloud Appliance Cisco switches. The Cisco EEM module is enabled by default on all Private Cloud Appliance Cisco switches.

  • EEM (Embedded Event Manager) policy

Uploading Support Bundles to Oracle Support

After you create a support bundle using the support-bundles command as described in Using the support-bundles Command, you can use the methods described in this topic to upload the support bundle to Oracle Support.

To use these methods, you must satisfy the following requirements:

  • You must have a My Oracle Support user ID with Create and Update SR permissions granted by the appropriate Customer User Administrator (CUA) for each Support Identifier (SI) being used to upload files.

  • For file uploads to existing service requests, the Support Identifier associated with the service request must be in your profile.

  • To upload files larger than 2 GB, sending machines must have network access to connect to the My Oracle Support servers at transport.oracle.com to use FTPS and HTTPS.

    The Oracle FTPS service is a "passive" implementation. With an implicit configuration, the initial connection is from the client to the service on a control port of 990 and the connection is then switched to a high port to exchange data. Oracle defines a possible range of the data port of 32000-42000, and depending upon your network configuration you may need to enable outbound connections on both port 990 and 32000-42000. TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 is the only encryption method enabled.

    The Oracle HTTPS diagnostic upload service uses the standard HTTPS port of 443 and does not require any additional ports to be opened.

    When using command line protocols, do not include your password in the command. Enter your password only when prompted.

  • Oracle requires the use of TLS 1.2+ for all file transfers.

  • Do not upload encrypted or password-protected files, standalone or within an archive. A Service Request update will note this as a corrupted file or reject the upload as disallowed file types were found. Files are encrypted when you use FTPS and HTTPS; additional protections are not required.

  • Do not upload files with file type extensions exe, bat, asp, or com, either standalone or within an archive. A Service Request update will note that a disallowed file type was found.

Uploading Files 2 GB or Smaller

Use the SR file upload utility on the My Oracle Support Portal.

  1. Log in to My Oracle Support with your My Oracle Support user name and password.

  2. Do one of the following:

    • Create a new service request and in the next step, select the Upload button.

    • Select and open an existing service request.

  3. Click the Add Attachment button located at the top of the page.

  4. Click the Choose File button.

  5. Navigate and select the file to upload.

  6. Click the Attach File button.

You can also use the methods described in the next section for larger files.

Uploading Files Larger Than 2 GB

You cannot upload a file larger than 200 GB. See Splitting Files.

The curl commands in this section show required options and arguments. You might want to add options such as --verbose and --progress-bar to get more information about your upload. The --progress-meter (more information than --progress-bar) should be on by default, but it is disabled when curl is writing other information to stdout. Note that some options might not be available or might behave differently on some operating systems or some versions of curl.

The following are the most common messages from uploading bundles to Oracle Support if you use the --verbose option with the curl command:

  • UPLOAD SUCCESSFUL. The bundle is successfully uploaded to Oracle Support.

  • LOGIN FAILED. The user has an authentication issue.

  • INVALID SR NUMBER. The user does not have attach privilege to this Service Request.

FTPS

Syntax:

Be sure to include the / character after the service request number.

$ curl -T path_and_filename -u MOS_user_ID ftps://transport.oracle.com/issue/SR_number/

Example:

$ curl -T /u02/files/bigfile.tar -u MOSuserID@example.com ftps://transport.oracle.com/issue/3-1234567890/

HTTPS

Syntax:

Be sure to include the / character after the service request number.

$ curl -T path_and_filename -u MOS_user_ID https://transport.oracle.com/upload/issue/SR_number/

Example:

$ curl -T D:\data\bigfile.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/

Renaming the file during send

$ curl -T D:\data\bigfile.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/NotSoBig.tar

Using a proxy

$ curl -k -T D:\data\bigfile.tar -x proxy.example.com:80 -u MOSuserID@example.com https://transport.oracle.com/upload/issue/3-1234567890/

Splitting Files

You can split a large file into multiple parts and upload the parts. Oracle Transport will concatenate the segments when you complete uploading all the parts.

Only HTTPS protocol can be used. Only the UNIX split utility can be used. The Microsoft Windows split utility produces an incompatible format.

To reduce upload times, compress the original file prior to splitting.

  1. Split the file.

    The following command splits the file file1.tar into 2 GB parts named file1.tar.partaa and file1.tar.partab.

    Important:

    Specify the .part extension exactly as shown below.

    $ split –b 2048m file1.tar file1.tar.part
  2. Upload the resulting file1.tar.partaa and file1.tar.partab files.

    Important:

    Do not rename these output part files.

    $ curl -T file1.tar.partaa -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/
    $ curl -T file1.tar.partab -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/
  3. Send the command to put the parts back together.

    The spit files will not be attached to the service request. Only the final concatenated file will be attached to the service request.

    $ curl -X PUT -H X-multipart-total-size:original_size -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/file1.tar?multiPartComplete=true

    In the preceding command, original_size is the size of the original unsplit file as shown by a file listing.

  4. Verify the size of the newly-attached file.

    Note:

    This verification command must be executed immediately after the concatenation command in Step 3. Otherwise, the file will have begun processing and will no longer be available for this command.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/file1.tar
        X-existing-file-size: original_size

Resuming an Interrupted HTTPS Upload

You can resume a file upload that terminated abnormally. Resuming can only be done by using HTTPS. Resuming does not work with FTPS. When an upload is interrupted by some event, the start with retrieving the file size of the interrupted file

  1. Determine how much of the file has already been uploaded.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
    HTTP/1.1 204 No Content
    Date: Tue, 15 Nov 2022 22:53:54 GMT
    Content-Type: text/plain
    X-existing-file-size: already_uploaded_size
    X-Powered-By: Servlet/3.0 JSP/2.2
  2. Resume the file upload.

    Note the file size returned in “X-existing-file-size” in Step 1. Use that file size after the -C switch and in the -H “X-resume-offset:” switch.

    $ curl -Calready_uploaded_size -H "X-resume-offset: already_uploaded_size" -T myinfo.tar -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
  3. Verify the final file size.

    $ curl -I -u MOSuserID@example.com https://transport.oracle.com/upload/issue/SR_number/myinfo.tar
    -H X-existing-file-size: original_size

    In the preceding command, original_size is the size of the original file as shown by a file listing.