Troubleshoot a Stack
Identify common problems in a Oracle WebLogic Server for OKE stack and learn how to diagnose to solve them.
Stack Creation Failed
Troubleshoot a failed Oracle WebLogic Server domain that you created using Oracle WebLogic Server for OKE.
Failed to install WebLogic Operator
Stack provisioning might fail when you create a domain with Oracle WebLogic Server for OKE in a new subnet for an existing VCN due to error in installation of WebLogic Server Kubernetes Operator.
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Aug 27, 2020 07:01:31 PM GMT> <INFO> <install_wls_operator.sh>
<(host:sample-admin.admin1.existingnetwork.oraclevcn.com) - <WLSOKE-VM-INFO-0020> :
Installing weblogic operator in namespace [wrjrf8-operator-ns]>
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec): <Aug 27, 2020
07:02:12 PM GMT> <ERROR> <install_wls_operator.sh>
<(host:sample-admin.admin1.existingnetwork.oraclevcn.com) - <WLSOKE-VM-ERROR-0013> : Error
installing weblogic operator. Exit code[1]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to create service account
Stack provisioning might fail with HTTP 409
conflict error if the
service account creation fails.
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":
"Operation cannot be fulfilled on serviceaccounts \"default\": the object has been modified;
please apply your changes to the latest version and try again","reason":"Conflict","details":
{"name":"default","kind":"serviceaccounts"}
,"code":409}
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to login to OCIR
Stack provisioning might fail if the docker login to OCI registry is not succcesful.
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.sh> <(host:sample-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0003> : Unable to login to custom OCIR
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.py> <(host:sample-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0020> : Error executing sh /u01/scripts/bootstrap/docker_init.sh. Exit code [1]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to verify OKE cluster node status
Stack provisioning fails if the OKE cluster worker nodes are inactive when you create the WebLogic domain with Oracle WebLogic Server for OKE.
<INFO> <oke_worker_status.py>
<(host:sample-admin.nokeadmin.okevcn.oraclevcn.com) - <WLSOKE-VM-INFO-0011> : Waiting
for the workers nodes to be Active. Retrying...><Dec 17, 2020 04:47:56 PM GMT> <ERROR>
<markers.py> <(host:sample-admin.okeadmin.okevcn.oraclevcn.com) - <Dec 17, 2020
16:47:56> - <WLS-OKE-ERROR-003> - Failed to verify oke cluster nodes status. [Exit code : Status
check timed out]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Nodepools are not Recreated with the Latest Kubernetes Version
Issue: If you upgrade an existing Kubernetes cluster and scale out a nodepool, the new nodes are created with Kubernetes version 1.20 or later.
Note:
This topic is applicable for instances provisioned prior to release 22.1.2.Workaround:
Load Balancer Creation Failed
After creating a stack, you might encounter an issue where the internal Load Balancer (LB) is missing.
<pending>
:kubectl get svc -n <domain-name>-internal
- Lack of quota for the selected LB shapes.
- Lack of available private IPs in the VCN or subnets selected during provisioning.
Check the Status of the Load Balancers
You can view the status of the load balancers by checking the load balancer services and the provisioning logs.
Load Balancer Services:
kubectl get svc -n wlsoke-ingress-nginx
If the output lists any of the load balancer services as <pending>
, under the EXTERNAL-IP
column, then the load balancers are not created.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
okename-internal LoadBalancer 10.96.185.81 <pending> 443:30618/TCP 11m
Provisioning logs:
If the internal load balancer is not created successfully, the /u01/logs/provisioning.log
file would include an error message.
<WLSOKE-VM-INFO-0058> : Installing ingress controller charts for jenkins [ ingress-controller ]>
<WLSOKE-VM-ERROR-0058> : Error installing ingress controller with Helm. Exit code [1]>
/u01/logs/provisioning_cmd.out
file, you would see the following error message: <install_ingress_controller.sh> - Error: timed out waiting for the condition
Reinstall the Load Balancer
After identifying and fixing the cause of the failure, like increased quota for the selected LB shape, you can reinstall the private load balancer in the stack.
- Run the following command to bounce the Jenkins service:
kubectl delete deployment.apps/nginx-ingress-controller -n wlsoke-ingress-nginx
- Run the following command to delete the load balancer that has an issue:
kubectl delete service/<service-prefix>-internal -n wlsoke-ingress-nginx
- Run the following command to remove the existing helm release:
helm uninstall ingress-controller
- Copy the YAML file to the temporary folder:
cp /u01/provisioning-data/*.yaml /tmp
- Run the following command to install the load balancer:
/u01/scripts/bootstrap/install_ingress_controller.sh /tmp/ingress-controller-input-values.yaml
- Run the following command to verify if load balancer services are created and have an IP addresses:
kubectl get svc -n wlsoke-ingress-nginx
Sample output:NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE domain_name-internal LoadBalancer 10.0.0.1 100.0.0.1 80:30605/TCP 12m