Troubleshoot and Known Issues in Domain in Image
Following are the common problems in Oracle WebLogic Server for OKE Domain In Image. Learn how to diagnose and solve them.
Topics
- Free-Tier Autonomous Database
- RCU Datasources have Targets only to Administration Server
- Handling NFS Locking Errors
- Unable to Access the Console or the Application
- Stack Creation Failed
- Load Balancer Creation Failed
- Previous Domain Image in Sample Application
- Reinstall Load Balancers for Jenkins
- Install Jenkins Manually
- Security Checkup Tool Warnings
- Get Additional Help
RCU Datasources have Targets only to Administration Server
If you are using Domain In Image, for RAC database, then data sources that you create with the Enterprise Edition are targeted to only the administration server. Some of the data sources. like mds-owsm
, opss-audit-DBDS
, opss-audit-viewDS
, opss-data-source
need to be targeted to the WLS cluster. You need to update this after provisioning, by using the update-domain
pipeline job.
Issue: Some of the data sources are not targeted to the WLS cluster.
Workaround:
-
Create a model yaml file with the name
ee_datasource.yaml
and save it to a preferred location. - Open the
ee_datasource.yaml
file and copy-paste the following resources information:Note:
Replace the administration server and cluster names in the target place holders <adminserver-name> and <cluster-name> respectively.resources: JDBCSystemResource: 'db1-mds-owsm': Target: '<adminserver-name>, <cluster-name>' 'db2-mds-owsm': Target: '<adminserver-name>, <cluster-name>' 'db1-opss-audit-DBDS': Target: '<adminserver-name>, <cluster-name>' 'db2-opss-audit-DBDS': Target: '<adminserver-name>, <cluster-name>' 'db1-opss-audit-viewDS': Target: '<adminserver-name>, <cluster-name>' 'db2-opss-audit-viewDS': Target: '<adminserver-name>, <cluster-name>' 'db1-opss-data-source': Target: '<adminserver-name>, <cluster-name>' 'db2-opss-data-source': Target: '<adminserver-name>, <cluster-name>' 'mds-owsm': Target: '<adminserver-name>, <cluster-name>' 'opss-audit-DBDS': Target: '<adminserver-name>, <cluster-name>' 'opss-audit-viewDS': Target: '<adminserver-name>, <cluster-name>' 'opss-data-source': Target: '<adminserver-name>, <cluster-name>'
- Run the
update-domain
pipeline job, with the location of the model yaml file. - After the job succeeds, from the administration console verify that the following data sources are targeted to administration server and the WLS cluster.
mds-owsm
opss-audit-DBDS
opss-audit-viewDS
opss-data-source
Handling NFS Locking Errors
By default, the WebLogic stores are mount to the shared file system, which use Network File System (NFS) version 3 and is disabled. Therefore, the file locks on the different WebLogic stores and may not release if the VM of any node pool in the WebLogic Node pool is abruptly shut down. This is encountered in different scenarios, like, when a VM is stopped, restarted, or terminated, and there are WebLogic pods assigned to the worker node that is being terminated.
[Store:280105]The persistent file store "_WLS_myinstance-admin-server" cannot open file _WLS_<instanceName>-<ServerName>000000.DAT.
Workaround:
Note:
Even if you are using an earlier version of WebLogic Server you need to complete these steps.- Apply patch 32471832 by using the opatch update job, which is available in July 2021 PSUs.
- For administration and managed server pods in the cluster, update the
domain.yaml
file by adding theDweblogic.store.file.LockEnabled=false
parameter.Following is an example, where theDweblogic.store.file.LockEnabled=false
parameter is added:serverPod: env: - name: USER_MEM_ARGS #Default to G1GC algo value: "-XX:+UseContainerSupport -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom" - name: JAVA_OPTIONS value: "-Dweblogic.store.file.LockEnabled=false -Dweblogic.rjvm.allowUnknownHost=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.security.remoteAnonymousRMIT3Enabled=false -Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
- Run the following command to apply
domain.yaml
.kubectl -f <domain.yaml-file-path>
Note:
If you have created Oracle WebLogic Server for OKE instances created after July 20, 2021, or the instances on which the July 2021 PSUs are applied, a few Security warnings are displayed. See About the Security Checkup Tool.Unable to Access the Console or the Application
Troubleshoot problems accessing the console or the application after the Oracle WebLogic Server for OKE domain is successfully created.
Error accessing the console or the application
If you receive 502 bad gateway error when accessing the Jenkins console
and WebLogic Server console, or the application using load balancer, use the
kubectl
command to get the node ports that are used by the
system and ensure that these node ports are open for access via the load balancer
subnet.
kubectl describe service --all-namespaces | grep -i nodeport
NodePort: http 32062/TCP
NodePort: https 30305/TCP
To check port access:
-
Access the Oracle Cloud Infrastructure console.
-
From the navigation menu, select Networking, and then click Virtual Cloud Networks.
-
Select the compartment in which you created the domain.
-
Select the virtual cloud network in which the domain was created.
-
Select the subnet where the WebLogic Server compute instance is provisioned.
-
Select the security list assigned to this subnet.
-
For an Oracle WebLogic Server for OKE cluster using a private and public subnet, make sure the following ingress rules exist:
Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 32062
Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 30305
For a domain on a private and public subnet, set the
Source
to the CIDR of the load balancer subnet.
Stack Creation Failed
Troubleshoot a failed Oracle WebLogic Server domain that you created using Oracle WebLogic Server for OKE.
Failed to install WebLogic Operator
Stack provisioning might fail when you create a domain with Oracle WebLogic Server for OKE in a new subnet for an existing VCN due to error in installation of WebLogic Server Kubernetes Operator.
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Aug 27, 2020 07:01:31 PM GMT> <INFO> <install_wls_operator.sh>
<(host:wrjrf8-admin.wrjrf8admin.existingnetwork.oraclevcn.com) - <WLSOKE-VM-INFO-0020> :
Installing weblogic operator in namespace [wrjrf8-operator-ns]>
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec): <Aug 27, 2020
07:02:12 PM GMT> <ERROR> <install_wls_operator.sh>
<(host:wrjrf8-admin.wrjrf8admin.existingnetwork.oraclevcn.com) - <WLSOKE-VM-ERROR-0013> : Error
installing weblogic operator. Exit code[1]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to create service account
Stack provisioning might fail with HTTP 409
conflict error if the
service account creation fails.
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":
"Operation cannot be fulfilled on serviceaccounts \"default\": the object has been modified;
please apply your changes to the latest version and try again","reason":"Conflict","details":
{"name":"default","kind":"serviceaccounts"}
,"code":409}
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to login to OCIR
Stack provisioning might fail if the docker login to OCI registry is not succcesful.
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.sh> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0003> : Unable to login to custom OCIR
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.py> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0020> : Error executing sh /u01/scripts/bootstrap/docker_init.sh. Exit code [1]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to download ATP wallet
Stack provisioning might fail if you create a JRF-enabled domain running WebLogic Server 12c and using an Oracle Autonomous Database.
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 12:31:11 PM GMT> <ERROR> <markers.py> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com
- <Sep 22, 2020 12:31:11> - <WLS-OKE-ERROR-003> - Failed to verify oke cluster nodes status.
[Exit code : {'status': 500, 'message': u'An internal server error has occurred.',
'code': u'InternalServerError', 'opc-request-id':
'768603269A9D460D9B979632FC04C181/37A72EDA76A2687A5E24499AA6A70F9B/7823A7DF9CDD435D869F3CB42C46B39E'}]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Failed to verify OKE cluster node status
Stack provisioning fails if the OKE cluster worker nodes are inactive when you create the WebLogic domain with Oracle WebLogic Server for OKE.
<INFO> <oke_worker_status.py>
<(host:wlsatpte-admin.nevcnokeadmin.nevcnokevcn.oraclevcn.com) - <WLSOKE-VM-INFO-0011> : Waiting
for the workers nodes to be Active. Retrying...><Dec 17, 2020 04:47:56 PM GMT> <ERROR>
<markers.py> <(host:wlsatpte-admin.nevcnokeadmin.nevcnokevcn.oraclevcn.com) - <Dec 17, 2020
16:47:56> - <WLS-OKE-ERROR-003> - Failed to verify oke cluster nodes status. [Exit code : Status
check timed out]>
Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.
Load Balancer Creation Failed
After provisioning a stack, you might encounter an issue where the internal Load Balancer (LB) is missing.
<pending>
:kubectl get svc -n ingress-nginx
In this case, the IP allocation for the LB fails and the LB instance is not created. This is because the quota for the selected LB shape is not available.
Previous Domain Image in Sample Application
You may encounter this issue in the WebLogic Server Console for a sample application Jenkins job.
After the sample application is successfully created in Jenkins using the
sample-app
job, on the WebLogic Server console, the sample
application is not deployed with the new domain image, and still shows the previous
domain image.
10:15:10 + echo 'Publishing image [iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12] to domain...'
10:15:10 Publishing image [iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12] to domain...
10:15:10 + local running_domain_yaml=/tmp/running-domain-20-09-11_16-59-12.yaml
10:15:10 + kubectl get domain rsht1domain -n rsht1-domain-ns -o yaml
10:15:10 + mkdir -p /u01/shared/weblogic-domains/rsht1domain/backups/20-09-11_16-59-12
10:15:10 + cp /tmp/running-domain-20-09-11_16-59-12.yaml \
/u01/shared/weblogic-domains/rsht1domain/backups/20-09-11_16-59-12/prev-domain.yaml
10:15:10 + sed -i -e 's|\(image: \).*|\1 "iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12"|g' /tmp/running-domain-20-09-11_16-59-12.yaml
10:15:10 + kubectl apply -f /tmp/running-domain-20-09-11_16-59-12.yaml
10:15:21 Error from server (Conflict): error when applying patch:
10:15:21 {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration": \
"{\"apiVersion\":\"weblogic.oracle/v8\",\"kind\":\"Domain\"
As a workaround, run the sample application again using the
sample-app
Pipeline job.
Reinstall Load Balancers for Jenkins
When provisioning, creating a load balancer can fail for different reasons. However, provisioning does not stop as the load balancers can be created later. Follow the steps in this section to recreate the load balancers.
When you create a Oracle WebLogic Server for OKE instance, two load balancers are created. One with a public IP, that provides access to the applications installed in the WebLogic cluster, and another with a private IP, to provide access to the WebLogic console and Jenkins console.
- Lack of quota for the selected LB shapes.
- Lack of available public IPs (for external load balancer) or private IPs (for internal load balancer) in the VCN or subnets selected during provisioning.
Check the Status of the Load Balancers
You can view the status of the load balancers by checking the Resource Manager job log, the load balancer services, and the provisioning logs.
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): {
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "weblogic_console_url": "http://<IP_adressess>/console",
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "jenkins_console_url": "http://<IP_adressess>/jenkins",
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "weblogic_cluster_lb_url": "https://<IP_adressess>/<application context>"
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): }
The first two lines are for the private load balancer. The third line is for the public load balancer. If any of the load balancers are not created, you will not see any of the above lines in the Resource Manager job log.
Load Balancer Services:
kubectl get svc -n ingress-nginx
If the output lists any of the load balancer services as
<pending>
, under the EXTERNAL-IP
column, then the load balancers are not created.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rsh3oke-external LoadBalancer 10.1.1.1 <pending> 443:30618/TCP 11m
rsh3oke-internal LoadBalancer 10.1.1.2 100.1.1.1 80:30790/TCP 11m
Provisioning logs:
If the internal or external load balancer is not created
successfully, the /u01/logs/provisioning.log
file would include
an error message.
<WLSOKE-VM-INFO-0058> : Installing ingress controller charts for jenkins [ ingress-controller ]>
<WLSOKE-VM-ERROR-0058> : Error installing ingress controller with Helm. Exit code [1]>
/u01/logs/provisioning_cmd.out
file, you
would see the following error message:
<install_ingress_controller.sh> - Error: timed out waiting for the condition
Reinstall the Load Balancers
After identifying and fixing the cause of the failure, like increased quota for the selected LB shape, you can reinstall the load balancers in the instance.
- Run the following command to get the values required to
install the
ingress-controller:
helm get values ingress-controller -o yaml > ingress_values.yaml
- Run the following command to remove the existing helm
release:
helm uninstall ingress-controller
Note:
This command will delete both the external and internal load balancers. - Run the following command to install both the external and
internal load
balancers:
/u01/scripts/bootstrap/install_ingress_controller.sh ingress_values.yaml
Sample Output:<Nov 20, 2020 08:01:01 PM GMT> <INFO> <install_ingress_controller.sh> <(host:host_name) - <WLSOKE-VM-INFO-0058> : Installing ingress controller charts for jenkins [ ingress-controller ]> <Nov 20, 2020 08:03:27 PM GMT> <INFO> <install_ingress_controller.sh> <(host:host_name) - <WLSOKE-VM-INFO-0059> : Successfully installed ingress controller>
- Run the following command to verify if load balancer
services are created and have external IP
addresses:
kubectl get svc -n ingress-nginx
Sample output:NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rsh3oke-external LoadBalancer 10.96.115.139 144.25.19.38 443:31162/TCP 12m rsh3oke-internal LoadBalancer 10.96.249.188 100.111.191.133 80:30605/TCP 12m
Install Jenkins Manually
When you create a Oracle WebLogic Server for OKE instance, Jenkins is installed by installing a Helm release called jenkins-oke. During provisioning, Jenkins installation may fail, but provisioning is not stopped, because Jenkins can be installed after provisioning. This section explains how to install Jenkins manually, if Jenkins installation has failed during provisioning.
Check if Jenkins Install Failed during Provisioning
You can know if the Jenkins install failed by trying to access the Jenkins console, checking the provisioning logs, and checking the Kubernetes resources (pods, services, and so on) under the jenkins-ns
namespace.
Access the Jenkins console:
Try accessing the Jenkins console, as described in Access the Jenkins Console.
If you are not able to access the console, then continue to the next section to check the logs.
Provisioning logs:
If Jenkins is not installed successfully, then the /u01/logs/provisioning.log
file would include an error message.
<WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns>
<WLSOKE-VM-ERROR-0052> : Error installing jenkins charts. Exit code[1]>
And, you would see the details of the failure in the /u01/logs/provisioning_cmd.out
file.
Kubernetes resources:
jenkins-ns
namespace, run the following command: kubectl get all -n jenkins-ns
NAME READY STATUS RESTARTS AGE
pod/jenkins-deployment-5bb55586b9-vn8sk 1/1 Running 0 26m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jenkins-service ClusterIP 10.96.149.6 <none> 8080/TCP,50000/TCP 26m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jenkins-deployment 1/1 1 1 26m
NAME DESIRED CURRENT READY AGE
replicaset.apps/jenkins-deployment-5bb55586b9 1 1 1 26m
Install Jenkins Manually
After identifying and fixing the cause of the failure, install Jenkins in your instance.
- Check if the
provisioning_metadata.properties
file exists, at the/u01/shared/weblogic-domains/<domain>
directory.Does theprovisioning_metadata.properties
file exist?- Yes: Continue with the next step.
- No: Run the following command:
python /u01/scripts/metadata/provisioning_metadata.py
Continue with the next step.
- Run the following command to remove the existing helm release:
helm uninstall jenkins-oke
- Run the following command to install Jenkins:
/u01/scripts/bootstrap/install_jenkins.sh /u01/provisioning-data/jenkins-inputs.yaml
Where,
jenkins-inputs.yaml
file contains the required variables.Sample Output:<Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.py> <(host:host_name) - updated /u01/provisioning-data/jenkins-inputs.yaml> <Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0098> : Creating configmap [wlsoke-metadata-configmap]> <Nov 23, 2020 05:10:09 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns> <Nov 23, 2020 05:10:22 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0057> : Successfully installed jenkins in namespace [ jenkins-ns ]>
You have successfully installed the Jenkins console. Try accessing the Jenkins console, as described in Access the Jenkins Console.
Security Checkup Tool Warnings
Learn about the security check warnings that are displayed in the Oracle WebLogic Server Administration console and how to troubleshoot them.
At the top of the WebLogic Server Administration console, the message
Security warnings detected. Click here to view the report and recommended
remedies
is displayed for Oracle WebLogic Server for
OKE instances created after July 20, 2021, or the instances on which
the July 2021 PSUs are applied.
When you click the message, a list of security warnings are displayed as listed in the following table.
The warning messages listed in the table are examples.
Security Warnings
Warning Message | Resolution |
---|---|
|
Review your applications before you make any changes to address these SSL host name security warnings. For applications that connect to SSL endpoints with a host name in the certificate, which does not match the local machine's host name, the connection fails if you configure the BEA host name verifier in Oracle WebLogic Server. For applications that connect to Oracle provided
endpoints such as Oracle Identity Cloud Service (for
example, You see the SSL host name verification warnings in case of existing Oracle WebLogic Server for OKE instances (created before July 20, 2021). To address this warning, you must configure SSL with host name verifier. See Configure SSL with host name verifier. |
|
Run the following command in the administration
server as
|
|
Set the java properties for anonymous RMI T3 and IIOP requests during server start up. See Set the Java Properties. |
After you address the warnings, you must click Refresh Warnings to see the warnings removed in the console.
For Oracle WebLogic Server for OKE instances created after July 20, 2021, though the java properties to disable anonymous requests for preventing anonymous RMI access are configured, the warnings still appear. This is a known issue in Oracle WebLogic Server.
Set the Java Properties
-
Edit the
domain.yaml
located in/u01/shared/weblogic-domains/<domain_name>/domain.yaml
for all instances ofserverPod
definitions as follows:serverPod: env: - name: USER_MEM_ARGS #admin server memory is explicitly set to min of 256m and max of 512m and GC algo is G1GC value: "-Xms256m -Xmx512m -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom" - name: JAVA_OPTIONS value: "-Dweblogic.store.file.LockEnabled=false -Dweblogic.rjvm.allowUnknownHost=true -Dweblogic.security.remoteAnonymousRMIT3Enabled=false -Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
-
Apply the
domain.yaml
using thekubectl
command:kubectl -f <path_to_domain.yaml>
Get Additional Help
Use online help, email, customer support, and other tools if you have questions or problems with Oracle WebLogic Server for OKE.
For general help with Oracle Cloud Marketplace, see How Do I Get Support in Using Oracle Cloud Marketplace.