Using the support-bundles Tool
The support-bundles tool collects various types of bundles, or modes, of Private Cloud Appliance diagnostic data such as health check status, command outputs, and logs.
Depending on the command options provided, these bundles might contain logs or status. All modes collect files into a bundle directory. No more than one support bundle process is allowed at one time. A support bundle lock file is created at the beginning of bundle collection and removed when bundle collection is complete.
All support-bundles commands return immediately, and the bundle collection runs in the background. This is because bundle collections might take hours to complete. Bundles are stored for two days, then automatically deleted.
The following types of bundles are supported:
-
Triage Mode. Collects data about the current status of the Private Cloud Appliance.
-
Time Slice Mode. Collects data by time slots. These results can be further narrowed by specifying pod name, job, and k8s_app label.
-
Combo Mode. Collects a combination of triage and time slice data.
-
Native Mode. Collects data from management, compute, and ZFS nodes and from ILOM and Cisco hosts.
A good way to start to investigate an issue is to collect a combo bundle. Look for NOT_HEALTHY in the triage mode results and compare that to what you see in the time_slice mode results.
The support-bundles command requires a mode option. All modes accept the service request number option. See the following table. Time slice and native modes have additional options.
|
Option |
Description |
Required |
|---|---|---|
|
|
The type of bundle. |
yes |
|
|
The service request number. |
no |
The support-bundles command output is stored in the following directory on the management node, where bundle-type is the mode: triage, time_slice, combo, or native:
/nfs/shared_storage/support_bundles/SR_number_bundle-type-bundle_timestamp/
The SR_number is used if you provided the -sr option. If you are creating the support bundle for a service request, specify the SR_number.
This directory contains a bundle collection progress file and an archive file, which are named as follows:
bundle-type_collection.log
SR_number_bundle-type-bundle_timestamp.tar.gz
The archive file contains a header.json file with the following default components:
-
current-time- the timestamp -
create-support-bundle- the command line that was used -
sr-number- the SR number associated with the archive file
Logging in to the Management Node
To use the support-bundles command, log in as root to the management node that is running Pacemaker resources. Collect data first from the management node that is running Pacemaker resources, then from other management nodes as needed.
If you do not know which management node is running Pacemaker resources, log in to any management node and check Pacemaker cluster status. The following command shows the Pacemaker cluster resources are running on pcamn01.
[root@pcamn01 ~]# pcs status
Cluster name: mncluster
Stack: corosync
Current DC: pcamn01
...
Full list of resources:
scsi_fencing (stonith:fence_scsi): Stopped (disabled)
Resource Group: mgmt-rg
vip-mgmt-int (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-host (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ilom (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-lb (ocf::heartbeat:IPaddr2): Started pcamn01
vip-mgmt-ext (ocf::heartbeat:IPaddr2): Started pcamn01
l1api (systemd:l1api): Started pcamn01
haproxy (ocf::heartbeat:haproxy): Started pcamn01
pca-node-state (systemd:pca_node_state): Started pcamn01
dhcp (ocf::heartbeat:dhcpd): Started pcamn01
hw-monitor (systemd:hw_monitor): Started pcamn01
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Triage Mode
In triage mode, Prometheus platform_health_check is queried for both HEALTHY and NOT_HEALTHY status. If NOT_HEALTHY is found, use time_slice mode to get more detail.
# support-bundles -m triage
The following files are in the output archive file.
|
File |
Description |
|---|---|
|
|
Timestamp and command line to generate this bundle. |
|
|
Pods running in the compute node. |
|
|
Hardware component list retrieved from |
|
|
Pods running in the management node. |
|
|
Rack installation time and build version. |
|
|
Chunk files in json. |
Time Slice Mode
In time slice mode, data is collected by specifying start and end timestamps. Both of the following options are required:
-
-s start_date -
-e end_date
Time slice mode has the following options in addition to the mode and service request number options. These options help narrow the data collection. If you do not specify either the -j or --all option, then data is collected from all health checker jobs.
-
Only one of
--job_name,--all, and--k8s_appan be specified. -
If none of
--job_name,--all, or--k8s_appis specified, the pod filtering will occur on the default (.+checker). -
The
--alloption can collect a huge amount of data. You might want to limit your time slice to 48 hours.
Example:
# support-bundles -m time_slice -j flannel-checker -s 2021-05-29T22:40:00.000Z \
-e 2021-06-29T22:40:00.000Z -l INFO
See more examples below.
|
Option |
Description |
Required |
|---|---|---|
|
|
Start date in format The minimum argument is |
yes |
|
|
End date in format The minimum argument is |
yes |
|
|
Loki job name. Default value: See Label List Query below. |
no |
--k8s_app label
|
The See Label List Query below. |
no |
--all
|
Queries all job names except for jobs known for too much logging, such as audit, kubernetes-audit, and vault-audit and k8s_app label pcacoredns. |
no |
|
|
Message level |
no |
|
|
The pod name (such as |
no |
|
|
Timeout in seconds for a single Loki query. By default it is 180 seconds. |
no |
- Label List Query
-
Use the label list query to list the available job names and
k8s_applabel values.# support-bundles -m label_list 2021-10-14T23:19:18.265 - support_bundles - INFO - Starting Support Bundles 2021-10-14T23:19:18.317 - support_bundles - INFO - Locating filter-logs Pod 2021-10-14T23:19:18.344 - support_bundles - INFO - Executing command - ['python3', '/usr/lib/python3.6/site-packages/filter_logs/label_list.py'] 2021-10-14T23:19:18.666 - support_bundles - INFO - Label: job Values: ['admin', 'api-server', 'asr-client', 'asrclient-checker', 'audit', 'cert-checker', 'ceui', 'compute', 'corosync', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 'flannel-checker', 'his', 'hms', 'iam', 'k8s-stdout-logs', 'kubelet', 'kubernetes-audit', 'kubernetes-checker', 'l0-cluster-services-checker', 'messages', 'mysql-cluster-checker', 'network-checker', 'ovm-agent', 'ovn-controller', 'ovs-vswitchd', 'ovsdb-server', 'pca-healthchecker', 'pca-nwctl', 'pca-platform-l0', 'pca-platform-l1api', 'pca-upgrader', 'pcsd', 'registry-checker', 'sauron-checker', 'secure', 'storagectl', 'uws', 'vault', 'vault-audit', 'vault-checker', 'zfssa-checker', 'zfssa-log-exporter'] Label: k8s_app Values: ['admin', 'api', 'asr-client', 'asrclient-checker', 'brs', 'cert-checker', 'compute', 'default-http-backend', 'dr-admin', 'etcd', 'etcd-checker', 'filesystem', 'filter-logs', 'flannel-checker', 'fluentd', 'ha-cluster-exporter', 'has', 'his', 'hms', 'iam', 'ilom', 'kube-apiserver', 'kube-controller-manager', 'kube-proxy', 'kubernetes-checker', ' l0-cluster-services-checker', 'loki', 'loki-bnr', 'mysql-cluster-checker', 'mysqld-exporter', 'network-checker', 'pcacoredns', 'pcadnsmgr', 'pcanetwork', 'pcaswitchmgr', 'prometheus', 'rabbitmq', 'registry-checker', 'sauron-api', 'sauron-checker', 'sauron-grafana', 'sauron-ingress-controller', 'sauron-mandos', 'sauron-operator', 'sauron-prometheus', 'sauron-prometheus-gw', 'sauron-sauron-exporter', 'sauron.oracledx.com', 'storagectl', 'switch-metric', 'uws', 'vault-checker', 'vmconsole', 'zfssa-analytics-exporter', 'zfssa-csi-nodeplugin', 'zfssa-csi-provisioner', 'zfssa-log-exporter']Examples:
-
No job label, no k8s_app label, collect log from all health checkers.
# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59" -
One job ceui.
# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -j ceui -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59" -
One k8s_app network-checker.
# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --k8s_app network-checker -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59" -
All jobs and date.
# support-bundles -m time_slice -sr 3-xxxxxxxxxxx -s `date -d "2 days ago" -u +"%Y-%m-%dT%H:%M:%S.000Z"` -e `date -d +u +"%Y-%m-%dT%H:%M:%S.000Z"` -
All jobs.
# support-bundles -m time_slice -sr 3-xxxxxxxxxxx --all -s "2022-01-11T00:00:00" -e "2022-01-12T23:59:59"
The following files are in the output archive file.
File
Description
header.jsonTimestamp and command line to generate this bundle.
loki_search_results.log.nChunk files in json. Time slice bundles have a limit of 500,000 logs per query, from start time.
rack_info.jsonRack installation time and build version.
-
Combo Mode
The combo mode is a combination of a triage bundle and a time slice bundle. The output includes an archive file and two collection log files: triage_collection.log and time_slice_collection.log.
The following files are in the output archive file.
|
File |
Description |
|---|---|
|
|
The triage bundle archive file. |
|
|
The time slice bundle archive file. The time slice data collected is for |
Native Mode
The native_collection.log file in the bundle directory provides collection progress information. Native bundles can take hours to collect.
The native mode has the following parameters in addition to mode and SR number.
|
Parameter |
Description |
Required |
|---|---|---|
|
|
Default value: |
no |
|
|
Component name, such as the name of a management, compute, or ZFS node, or an ILOM or Cisco host. |
no |
The following files are in the output archive file.
|
File |
Description |
|---|---|
|
|
Time stamp and command line to generate this bundle. |
|
Native bundle files |
These files are specific to the |
|
|
Rack installation time and build version. |
- ZFS Bundle
-
When
nativetypeis a ZFS support bundle, collection starts on both ZFS nodes and downloads the new ZFS support bundles into the bundle directory. Whennativetypeis not specified,zfs_bundleis created by default.# support-bundles -m native -t zfs_bundle - SOS Report Bundle
-
When
nativetypeis an SOS report bundle, the report is collected from the management node or compute node specified by the--componentparameter. If--componentis not specified, the report is collected from all management and compute nodes.# support-bundles -m native -t sosreport -c pcamn01 - ILOM Snapshot
-
When
nativeType=ilom_snapshot, the value of the--componentparameter is the ILOM host name of a management node or compute node. If the--componentparameter is not specified, the report is collected from all ILOM hosts.# support-bundles -m native -t ilom_snapshot -c ilom-pcacn007 - Cisco Bundle
-
When
nativetypeiscisco-bundle, the value of the--componentparameter is an internal Cisco management, aggregation, or access switch management host name.# support-bundles -m native -t cisco-bundle -c accsn01To create a
cisco-bundletype of collection, the following conditions must be met:-
The Cisco OBFL module must be enabled on all Private Cloud Appliance Cisco switches. The Cisco OBFL module is enabled by default on all Private Cloud Appliance Cisco switches.
-
The Cisco EEM module must be enabled on all Private Cloud Appliance Cisco switches. The Cisco EEM module is enabled by default on all Private Cloud Appliance Cisco switches.
-
EEM (Embedded Event Manager) policy
-