Identify the causes and fixes for general problems with Service Mesh. The following
general troubleshooting solutions are available.
Changes Made to Mesh Resources in the OCI Console or OCI CLI
Revert to their Previous State
Issue
Any changes made to mesh resources (for example: ingress gateway, virtual service,
virtual deployment, and so on) from the OCI Console or the OCI CLI revert to their
previous state based on the update interval set for the operator.
Explanation
Currently, after initial creation of a mesh resource, changes to resources
can only be made from OCI Service Operator for Kubernetes and therefore
kubectl. Based on the operator update interval, for
example, every hour, the OCI Service Operator for Kubernetes runs a
reconciliation process. Any resources in the service mesh control plane with
different settings are reverted to match the settings in the OCI Service
Operator for Kubernetes.
Having General Traffic Issues with Service Mesh 🔗
Issue
Missing or unexpected service traffic is usually a sign of improper routing settings.
Following are the most common reasons why a service communication is not working as
expected.
Common Reasons
Resources are not in correct state
One reason for unexpected traffic to a virtual deployment could be a
virtual service route table update caused a virtual deployment to go
into a failed state and the default routing policy is set as
UNIFORM. Ensure that all your resources are in an
Active state. If any of your required resources
are in a Failed or Deleted state,
they are not used to build the routing configurations.
Protocol or Port Mismatch
The protocol or port between ingress gateway host listeners and
virtual deployments doesn't match what is specified in an ingress
gateway route table and virtual service route table.
DNS Host Mismatch
The DNS Hostname of the virtual deployment must match with the
Kubernetes service.
Host Header Mismatch
Your internal and external service caller is not using host headers
specified by the Kubernetes service. Remember that if you are not
using a standard port for the protocol, <host>:<port> the same
values must also be specified in the hosts of a virtual service or
ingress gateway host listener.
This rule also extends to direct usage of IP Addresses. If you want
to use an IP Address, then the IP address must be specified as the
host of the ingress gateway host listener or virtual service.
Missing Access Policy
Not having an access policy which allows traffic to or from a
service.
SSL Related Reasons
Server name mismatch
To initiate TLS handshake, your internal and external service
communication requires using the hostnames specified in a virtual
service or an ingress gateway.
Using Expired Certificate Authority
Service mesh checks that users do not provide already expired
certificates or certificates authority as part of the Service Mesh
resource creation. However, the customer is responsible for rotating
the certificate authority before expiration so that certificates are
renewed.
Troubleshooting Ingress Gateway Deployments 🔗
Issue
The IngressGatewayDeployment resource creates dependent resources like Deployment, Service, and Horizontal Pod Autoscaler. The Service created by IngressGatewayDeployment can in turn create a LoadBalancer resource. If any of these dependent resources fail to create, the IngressGatewayDeployment resource doesn't become active. To remediate some common issues, review the following:
Solution
If the deployment produces an error similar to the following, this error
means that the service of type LoadBalancer created by
IngressGatewayDeployment fails to create a public load balancer in a private
subnet.
Warning SyncLoadBalancerFailed 3m2s (x10 over 48m) service-controller (combined from similar events): Error syncing load balancer: failed to ensure load balancer: creating load balancer: Service error:InvalidParameter. Private subnet with id <subnet-ocid> is not allowed in a public loadbalancer.. http status code: 400. Opc request id: <opc-request-id>
To use a private or internal load balancer, do the following.
Remove the service section from the IngressGatewayDeployment
resource.
Create a Service with the correct annotations that points to the ingress
gateway pods.
Your updated resources look similar to the following examples.
Horizontal Pod Autoscaler (HPA) does not Scrape Metrics 🔗
Issue
The Horizontal Pod Autoscaler (HPA) does not scrape metrics.
Solution
When an application pod is set up with Service Mesh, the Service Mesh proxy
container is injected into the pod. Along with the proxy container, an init
container is also injected which does a one time initialization required for
enabling the proxy.
Because of the presence of the init container in the pod the metrics-server
is unable to scrape metrics from the pod in some scenarios, refer to the
following table.
metrics-server Version
HPA API Version
Able to Scrape Metrics
v0.6.x
autoscaling/v2beta2
No
v0.6.x
autoscaling/v1
Yes
v0.4.x
Any
No
Virtual Deployment Pods Receive No Traffic 🔗
Issue
My virtual deployment pods receive no traffic.
Solution
By default, the routing policy for a virtual service is
DENY. Therefore, do one of the following:
Change the routing policy to UNIFORM.
Create a virtual service route table to route traffic to your virtual
deployment.
Troubleshoot Traffic Issues with Proxy config_dump 🔗
Issue
You're experiencing one of the following traffic issues.
A service isn't receiving any traffic.
Secure communication isn't happening between services.
Traffic splitting isn't happening across versions.
A/B deployment testing, canary deployment fails.
Solution
To troubleshoot the issue, get the config_dump file for the
pod with the issue. You can infer more information by looking at the source
and destination pod config_dump files. To get the file,
perform the following steps.