Troubleshooting with Mesh Doctor

Mesh Doctor is a tool to troubleshoot and debug issues in a Service Mesh setup. Mesh doctor collects required configurations and logs, runs an analysis, and generates reports. The reports summarize the state of your Service Mesh and suggest necessary troubleshooting steps to fix any issues.

Using Mesh Doctor

You can use the Mesh Doctor tool in two ways.

  • OCI CLI: Run Mesh Doctor from the command line with the debug command.
    Note

    To install OCI CLI, see: Install OCI CLI.
  • Console: You can run the Mesh Doctor from the console using the following steps.

    The OCI Console opens the Mesh Doctor command in an OCI Cloud Shell window and runs a command. When the command completes, Mesh Doctor provides the path to the zipped file with the generated reports. To view the reports, unzip the file in Cloud Shell or download the zipped file from Cloud Shell.

Important

Cloud Shell powers the Mesh Doctor user interface. This command only works in public clusters. The command times out in private clusters.

Mesh Doctor Command-Line Options

The following table provides a detailed list of all the Mesh Doctor command line options based on the oci service-mesh debug base command.

Parameter isOptional Value Type Default Example Notes
kubeconfig True FilePath(String) kubeconfig present in ~/.kube/config ~/config Config of the Kubernetes cluster. If the config isn't provided, the default config is used by the command.
resource-id True OCID Null ocid1.mesh.oc1.iad.id Resource to be diagnosed. If the resource isn't provided the command diagnoses the installation.
context True String current-context in kube-config context-aaa The context of the Kubernetes cluster.
thread-pool-size True Int 25 10 Number of threads used to parallelize the processing.

Using Mesh Doctor CLI to Troubleshoot Setup

To troubleshoot an entire service mesh setup in the Kubernetes cluster, run the following command.

oci service-mesh debug report

Using Mesh Doctor CLI to Troubleshoot Mesh Resources

The following Mesh Doctor CLI commands provide example use cases.

Troubleshoot a Mesh:
oci service-mesh debug report --resource-id ocid1.mesh.oc1.iad.aaa...
Running the command produces output similar to the following:
Bundle file path: /my-home/service-mesh-debug-report_07-01-2022_20-00-00
=============================== Mesh Report Analysis ===============================

OLM version: v0.20.0

|    Sidecar Image Versions    |
|    Version     |      Count     |
|    0.1.520     |       13       |
All sidecars are using same version

|  Config Versions   |
| Version   |   Count   |
|    5      |     13    |
All  configs are of the same version

All Operator Services are installed

All Mesh Webhooks are installed

All Mesh Custom Resources are installed
Troubleshoot a Virtual Service:
oci service-mesh mesh-debug report --resource-id ocid1.meshvirtualservice.oc1.iad.aaa...
Troubleshoot a Virtual Deployment:
oci service-mesh mesh-debug report --resource-id ocid1.meshvirtualdeployment.oc1.iad.aaa...
Troubleshoot an Ingress Gateway:
oci service-mesh mesh-debug report --resource-id ocid1.meshingressgateway.oc1.iad.aaa...
Sample Mesh Report

The following is a sample Mesh Doctor report run on a mesh.

report-mesh.json

{
    "metrics_server": [
        {
            "labels": {},
            "name": "Unavailable",
            "namespace": "Unavailable",
            "status": "Unavailable",
            "version": "Unavailable"
        }
    ],
    "oci_cli_version": [
        "X.X.X"
    ],
    "oci_service_operator_for_kubernetes": [
        {
            "labels": {
                "control-plane": "controller-manager",
                "pod-template-hash": "aaa"
            },
            "name": "oci-service-operator-controller-manager-aaa-tm52n",
            "namespace": "oci-service-operator-system",
            "status": {
                "conditions": [
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2022-04-13T00:06:20Z",
                        "status": "True",
                        "type": "Initialized"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2022-04-13T00:06:30Z",
                        "status": "True",
                        "type": "Ready"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2022-04-13T00:06:30Z",
                        "status": "True",
                        "type": "ContainersReady"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2022-04-13T00:06:20Z",
                        "status": "True",
                        "type": "PodScheduled"
                    }
                ],
                "containerStatuses": [
                    {
                        "containerID": "cri-o://aaa...",
                        "image": "iad.ocir.io/aaa/oci-service-operator:1.0.X",
                        "imageID": "iad.ocir.io/aaa/oci-service-operator@sha256:aaa",
                        "lastState": {},
                        "name": "manager",
                        "ready": true,
                        "restartCount": 0,
                        "started": true,
                        "state": {
                            "running": {
                                "startedAt": "2022-04-13T00:06:24Z"
                            }
                        }
                    }
                ],
                "hostIP": "10.0.10.X",
                "phase": "Running",
                "podIP": "10.244.2.X",
                "podIPs": [
                    {
                        "ip": "10.244.2.X"
                    }
                ],
                "qosClass": "Burstable",
                "startTime": "2022-04-13T00:06:20Z"
            },
            "version": "1.0.X"
        }
    ],
    "olm": [
        {
            "labels": {
                "app": "olm-operator",
                "pod-template-hash": "aaa"
            },
            "name": "olm-operator-aaa-k42xw",
            "namespace": "olm",
            "status": {
                "running": {
                    "startedAt": "2022-04-13T00:05:37Z"
                }
            },
            "version": "v0.20.0"
        }
    ],
    "pod_summary": [
        {
            "labels": {
                "app": "productpage",
                "pod-template-hash": "aaa",
                "version": "v1"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "productpage-v1-aaa-f5ptd",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:37:57Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/productpage-v1-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        },
        {
            "labels": {
                "app": "reviews",
                "pod-template-hash": "aaa",
                "version": "v3"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "reviews-v3-aaa-q9z6k",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:37:46Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/reviews-v3-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        },
        {
            "labels": {
                "app": "reviews",
                "pod-template-hash": "bbb",
                "version": "v2"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "reviews-v2-bbb-9rdpw",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:37:40Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/reviews-v2-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        },
        {
            "labels": {
                "app": "reviews",
                "pod-template-hash": "ddd",
                "version": "v1"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "reviews-v1-ddd-kq6qr",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:37:27Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/reviews-v1-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        },
        {
            "ig_id": "ocid1.meshingressgateway.oc1.iad.aaa...",
            "igd_key": "my-namespace/bookinfo-ig-deployment",
            "labels": {
                "pod-template-hash": "eee",
                "servicemesh.oci.oracle.com/ingress-gateway-deployment": "bookinfo-ig-deployment"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "bookinfo-ig-deployment-deployment-eee-dj9b5",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T00:12:15Z"
                }
            },
            "proxy_version": "0.1.X"
        },
        {
            "labels": {
                "app": "ratings",
                "pod-template-hash": "fff",
                "version": "v1"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "ratings-v1-fff-67txf",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:35:36Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/ratings-v1-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        },
        {
            "labels": {
                "app": "details",
                "pod-template-hash": "aaa",
                "version": "v1"
            },
            "mesh_id": "ocid1.mesh.oc1.iad.aaa...",
            "name": "details-v1-aaa-xsmkq",
            "namespace": "my-namespace",
            "proxy_status": {
                "running": {
                    "startedAt": "2022-04-13T05:38:03Z"
                }
            },
            "proxy_version": "0.1.X",
            "vd_id": "ocid1.mesh.oc1.iad.aaa...",
            "vdb_key": "my-namespace/details-v1-binding",
            "vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
        }
    ],
    "sidecar_injection_enabled_namespaces": [
        [
            "host-mesh-cp-aaa",
            "my-namespace"
        ]
    ]
}
Required Kubernetes Authorizations

Mesh Doctor runs kubectl commands on behalf of the user using the user's existing Kubernetes authorizations. If the required permissions aren't present, the command fails to collect data.

To collect all the required data, users need the following access permissions:

  • list, get, exec - for pods in the service mesh.
  • list, get - for all mesh resources (CRD's).
  • list, get, exec - for the pods in the OLM namespace.
  • list - permission for services.

For more information on Kubernetes role-based access control, see Using RBAC Authorization

Mesh Doctor Reports Structure

When Mesh Doctor runs, the tool structures the data returned into a reporting hierarchy. When Mesh Doctor runs on a specific resource, the tool includes only the data for that resource and child data in the report. Mesh Doctor uses the following reporting structure.

Mesh <directory>

  • Mesh report
  • OCI Service Operator for Kubernetes logs
  • Dump of cluster service version
  • Customer resource definition (CRD) of mesh if present
  • Ingress gateway <directory>
    • Ingress gateway report
    • CRD of ingress gateway if present
    • Ingress gateway deployment
      • CRD of ingress gateway deployment
      • configdump_<podName>_<podNamespace>.json
      • proxylogs_<podName>_<podNamespace>.log
  • Virtual service <directory>
    • Virtual service report
    • CRD of virtual service if present
    • Virtual deployment <directory>
      • Virtual deployment report
      • CRD of virtual deployment if present
      • Virtual deployment binding <directory>
        • CRD of virtual deployment binding
        • configdump_<podName>_<podNamespace>.json
        • >proxylogs_<podName>_<podNamespace>.log