Troubleshoot a WebLogic Domain
Learn about the common issues when creating and managing a domain and then how to diagnose to solve them.
Topics
- Patching Job Fails
- Provisioning Fails at a Specific Stage
- Unable to View Jenkins UI Input Parameters
- Cleanup Resources Manually for a Failed Domain
- Terminate Domain Job Is Stuck at Finish_cleanup Phase
- Introspection Failed when Running Pipeline Jobs
- New Data Source Incorrectly Deployed
- WebLogic Server Pod Fails to Start
- Unable to Access the Console or the Application
- Load Balancer Creation Failed
- Jenkins Installation Fails
- T3 RMI Communication Between Domains Fails
- Unrecognized Arguments When Using the Patching Utility Tool
- Security Checkup Tool Warnings
Patching Job Fails
When you perform an opatch update, create base image, or automatic patching, the job fails.
Note:
This is applicable only if you use the WebLogic Server version 21.3.3 (September 2021).To solve this issue, you must update the
pipeline_common.sh
and apply_latest_psu.sh
scripts:
Provisioning Fails at a Specific Stage
When you create a domain, the provisioning might fail at the specific stage. After you fix the issue, you must continue to create the domain from the previously failed stage only.
To restart provisioning from the previously failed stage:
Unable to View Jenkins UI Input Parameters
You need to approve groovy scripts to view all the parameters in a list.
Issue: At times, the Jenkins UI input parameters in a list are not rendered as you need to approve the scripts.
Workaround:
- Sign in to the Jenkins console for your stack. See Access the Jenkins Console.
- Go to Dashboard > Manage Jenkins.
- Under Security, click In-process Script Approval.
- Click Approve against all the groovy
scripts.
All the parameters are now listed in the pipeline jobs.
Cleanup Resources Manually for a Failed Domain
When the domain creation for a WebLogic domain with domain name, domain_1, fails and you create another WebLogic domain with the same name, domain_1, this domain creation also fails.
As the resources created for the domain, domain_1
cannot be
deleted using the terminate domain job, you must
clean up the resources manually for domain_1
using the
following command:
/u01/shared/scripts/pipeline/helper-scripts/domain_resource_cleanup.sh
<domain_name>
For example, to clean up the resources for domain_1, run the following command:
/u01/shared/scripts/pipeline/helper-scripts/domain_resource_cleanup.sh
domain_1
You can now create the WebLogic domain with the domain name, domain_1.
Terminate Domain Job Is Stuck at Finish_cleanup Phase
Troubleshoot the problem to terminate a domain created for Oracle WebLogic Server for OKE.
Note:
This topic is applicable for Oracle WebLogic Server for OKE domains created in release 22.2.2 (May, 2022).
Issue:
The terminate domain job gets stuck at the Finish_cleanup phase waiting for the pods to come up, and does not terminate the domain.
Workaround:
You must update the Jenkins template details before you terminate a domain.
- Sign in to the Jenkins console for your stack. See Access the Jenkins Console.
- On the Dashboard page, click Manage Jenkins.
- Under System Configuration, click Manage Nodes and Clouds, and then click Configure Clouds.
- On the Configure Clouds page, click Pod Templates.
- For pod-template-jenkins, click Pod Template details.
- Locate the Node Selector field and update the node label to usage-jenkins=jenkins.
Introspection Failed when Running Pipeline Jobs
In some instances, the Kubernetes job (DOMAIN_UID-introspector
) created for the introspection fails. When the initial introspection fails, the operator does not start any WebLogic Server instances. If there are already WebLogic Server instances running, then a failed introspection leaves the existing WebLogic Server instances running without making any changes to the operational state of the domain. The introspection is periodically retried and then eventually timeout with the Domain status indicating the processing failed. To recover from a failed state, you need correct the underlying problem and update the introspectVersion
or restartVersion
.
Check the introspector job
If your introspector job failed, then examine the kubectl describe
of the job and its pod. Also, examine its log, located at /u01/shared/weblogic-domains/<domain-name>/logs/introspector_script.out
.
For example, assuming your domain UID is sample-domain1
and your domain namespace is sample-domain1-ns
, following is a failed introspector job pod among the domain's pods:
$ kubectl -n sample-domain1-ns get pods -l weblogic.domainUID=sample-domain1
NAME READY STATUS RESTARTS AGE
sample-domain1-admin-server 1/1 Running 0 19h
sample-domain1-introspector-v2l7k 0/1 Error 0 75m
sample-domain1-managed-server1 1/1 Running 0 19h
sample-domain1-managed-server2 1/1 Running 0 19h
Let us look at the job's describe:
$ kubectl -n sample-domain1-ns describe job/sample-domain1-introspector
Now, let us look at the job's pod describe, in particular look at its events:
$ kubectl -n sample-domain1-ns describe pod/sample-domain1-introspector-v2l7k
Finally, let us look at the job's pod's log:
$ kubectl -n sample-domain1-ns logs job/sample-domain1-introspector
Alternative log command (will have same output as previous):
$ kubectl -n sample-domain1-ns logs pod/sample-domain1-introspector-v2l7k
A common reason for the introspector job to fail is because of an error in a model file. Following is a sample log output from an introspector job that displays such a failure:
...
SEVERE Messages:
1. WLSDPLY-05007: Model file /u01/wdt/models/model1.yaml,/weblogic-operator/wdt-config-map/..2020_03_19_15_43_05.993607882/datasource.yaml contains an
unrecognized section: TYPOresources. The recognized sections are domainInfo, topology, resources, appDeployments, kubernetes
Initiating Rolling Restart
If a model file error references a model file in your spec.configuration.model.configMap
file, then you can correct the error by redeploying the ConfigMap with a corrected model file and then initiating a domain restart or roll. Similarly, if a model file error references a model file in your model image, then you can correct the error by deploying a corrected image, modifying your Domain YAML file to reference the new image under spec.image
, and then initiating a domain restart or roll.
To continue to use the pipeline jobs to update the running domain, we need to ensure that the introspector is in Success
status, which can be achieved by rolling the domain to the previous successful image.
To rollback to the previous previous successful image, run the following command:
/u01/shared/scripts/pipeline/common/pipeline_common.sh -i <image_name>
<image_name>
is the image ID in the prev-domain.yaml
file, located in the backup directory at /u01/shared/weblogic-domains/<domain_name>
/backups
.
Note:
prev-domain.yaml
is the previous domain that was running before the current job completed.
As the introspector was in the failure status, the domain pods did not restart and would be in the previous image. Once the above function is invoked, introspector succeeds and the pipeline jobs can be reused.
configmap
, initiate the rolling restart by completing the following steps:
- Rectify the error in the yaml file.
- Increment the value of
spec.restartVersion
.- Perform a
kubectl edit domain -n <domain_ns> -o yaml
.This opens the yaml file in the VI editor.
- Under spec, search for the
restartVersion
flag and increment the value.
- Perform a
- Save the yaml file.
kubectl get pods -A
The age for the pod must not correspond to the time when the update-domain job completed.
NAME READY STATUS RESTARTS AGE
sample-domain1-admin-server 1/1 Running 0 19h
sample-domain1-managed-server1 1/1 Running 0 19h
sample-domain1-managed-server2 1/1 Running 0 19h
New Data Source Incorrectly Deployed
This section covers the known issue when you create data sources in your Oracle WebLogic Server for OKE domain.
If the user adds a new data source and deploys the data source to a cluster only, by default, the data source is deployed to both the managed server and the administration server in the cluster.
WebLogic Server Pod Fails to Start
By default, the WebLogic stores are mount to the shared file system, which use Network File System (NFS) version 3 and is disabled. Therefore, the file locks on the different WebLogic stores and may not release if the VM of any node pool in the WebLogic Node pool is abruptly shut down. This is encountered in different scenarios, like, when a VM is stopped, restarted, or terminated, and there are WebLogic pods assigned to the worker node that is being terminated.
[Store:280105]The persistent file store "_WLS_myinstance-admin-server" cannot open file _WLS_<instanceName>-<ServerName>000000.DAT.
Workaround:
Note:
Even if you are using an earlier version of WebLogic Server you need to complete these steps.- Apply patch 32471832 by using the apply patch job, which is available in July 2021 PSUs.
- For administration and managed server pods in the cluster, update the
domain.yaml
file by adding theDweblogic.store.file.LockEnabled=false
parameter.Following is an example, where theDweblogic.store.file.LockEnabled=false
parameter is added:serverPod: env: - name: USER_MEM_ARGS #Default to G1GC algo value: "-XX:+UseContainerSupport -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom" - name: JAVA_OPTIONS value: "-Dweblogic.store.file.LockEnabled=false -Dweblogic.rjvm.allowUnknownHost=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.security.remoteAnonymousRMIT3Enabled=false -Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
- Run the following command to apply
domain.yaml
.kubectl -f <domain.yaml-file-path>
Note:
If you have created Oracle WebLogic Server for OKE instances created after July 20, 2021, or the instances on which the July 2021 PSUs are applied, a few Security warnings are displayed. See About the Security Checkup Tool.Unable to Access the Console or the Application
Troubleshoot problems accessing the console or the application after the Oracle WebLogic Server for OKE domain is successfully created.
Error accessing the console or the application
If you receive 502 bad gateway error when accessing the Jenkins console
and WebLogic Server console, or the application using load balancer, use the
kubectl
command to get the node ports that are used by the
system and ensure that these node ports are open for access via the load balancer
subnet.
kubectl describe service --all-namespaces | grep -i nodeport
NodePort: http 32062/TCP
NodePort: https 30305/TCP
To check port access:
-
Access the Oracle Cloud Infrastructure console.
-
From the navigation menu, select Networking, and then click Virtual Cloud Networks.
-
Select the compartment in which you created the domain.
-
Select the virtual cloud network in which the domain was created.
-
Select the subnet where the WebLogic Server compute instance is provisioned.
-
Select the security list assigned to this subnet.
-
For an Oracle WebLogic Server for OKE cluster using a private and public subnet, make sure the following ingress rules exist:
Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 32062
Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 30305
For a domain on a private and public subnet, set the
Source
to the CIDR of the load balancer subnet.
Load Balancer Creation Failed
After creating a domain, you might encounter an issue where the external Load Balancer (LB) is missing.
<pending>
:kubectl get svc -n <domain-name>-lb-external
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
wlsoke-ingress-nginx mydomain-lb-external LoadBalancer 10.0.0.1 <pending> 80:32148/TCP,443:31808/TCP 27h
The load balancer creation fails because there is a lack of available private IPs in the VCN or subnets selected during provisioning.
Workaround: Clean up any unwanted resources to release the IPs.
Jenkins Installation Fails
When you create a Oracle WebLogic Server for OKE instance, Jenkins is installed by installing a Helm release called jenkins-oke. During provisioning, Jenkins installation may fail, but provisioning is not stopped, because Jenkins can be installed after provisioning. This section explains how to install Jenkins manually, if Jenkins installation has failed during provisioning.
Check if Jenkins Install Failed during Provisioning
You can know if the Jenkins install failed by trying to access the Jenkins console, checking the provisioning logs, and checking the Kubernetes resources (pods, services, and so on) under the jenkins-ns
namespace.
Access the Jenkins console:
Try accessing the Jenkins console, as described in Access the Jenkins Console.
If you are not able to access the console, then continue to the next section to check the logs.
Provisioning logs:
If Jenkins is not installed successfully, then the /u01/logs/provisioning.log
file would include an error message.
<WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns>
<WLSOKE-VM-ERROR-0052> : Error installing jenkins charts. Exit code[1]>
And, you would see the details of the failure in the /u01/logs/provisioning_cmd.out
file.
Kubernetes resources:
jenkins-ns
namespace, run the following command: kubectl get all -n jenkins-ns
NAME READY STATUS RESTARTS AGE
pod/jenkins-deployment-5bb8596b9-abcd 1/1 Running 0 26m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jenkins-service ClusterIP 10.0.0.1 <none> 8080/TCP,50000/TCP 26m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jenkins-deployment 1/1 1 1 26m
NAME DESIRED CURRENT READY AGE
replicaset.apps/jenkins-deployment-5bb55586b9 1 1 1 26m
Install Jenkins Manually
After identifying and fixing the cause of the failure, install Jenkins in your instance.
- Check if the
provisioning_metadata.properties
file exists, at the/u01/shared/weblogic-domains/<domain>
directory.Does theprovisioning_metadata.properties
file exist?- Yes: Continue with the next step.
- No: Run the following command:
python /u01/scripts/metadata/provisioning_metadata.py
Continue with the next step.
- Run the following command to remove the existing helm release:
helm uninstall jenkins-oke
- Run the following command to install Jenkins:
/u01/scripts/bootstrap/install_jenkins.sh /u01/provisioning-data/jenkins-inputs.yaml
Where,
jenkins-inputs.yaml
file contains the required variables.Sample Output:<Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.py> <(host:host_name) - updated /u01/provisioning-data/jenkins-inputs.yaml> <Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0098> : Creating configmap [wlsoke-metadata-configmap]> <Nov 23, 2020 05:10:09 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns> <Nov 23, 2020 05:10:22 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0057> : Successfully installed jenkins in namespace [ jenkins-ns ]>
You have successfully installed the Jenkins console. Try accessing the Jenkins console, as described in Access the Jenkins Console.
T3 RMI Communication Between Domains Fails
You might encounter a T3 communication error between domains in Oracle WebLogic Server for OKE.
Issue:
When you try to establish an RMI communication between two domains, Domain A and Domain B, in different namespaces within the same cluster, using the T3 protocol, the connection fails.
Workaround:
You must set up the WebLogic custom channel on Domain B. To configure the WebLogic custom channel, see Configuring a WebLogic custom channel in WebLogic Kubernetes Operator documentation.
- Run the following command to obtain the cluster service names:
kubectl get svc -n <domain_namespace>
Sample output:NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE domainB-ns domainB-cluster-domainB-cluster ClusterIP 10.96.37.63 <none> 7999/TCP,8001/TCP 3d1h domainB-ns domainB-cluster-domainB-cluster ClusterIP 10.96.37.63 <none> 7999/TCP,8001/TCP 3d1h domainB-ns domainB-domainB-adminserver ClusterIP None <none> 30012/TCP,7001/TCP 10d domainB-ns domainB-domainB-managed-server1 ClusterIP None <none> 7999/TCP,8001/TCP 3d1h domainB-ns domainB-domainB-managed-server2 ClusterIP None <none> 7999/TCP,8001/TCP 3d1h
- Use the cluster service name and domain namespace from step 1
to obtain the cluster address.
The cluster address format is:
t3://<name of the cluster service>.<domain namespace>:ListenPort
For example:t3://domainB-ns domainB-cluster-domainB-cluster.domainB-ns:7999
- Run the update-domain job on Domain B using the following model yaml file. See
Update a Domain Configuration.
In the model Yaml file, under the
NetworkAccessPoint
section, specify the cluster address from step 2.Example of a model Yaml file.
topology: Cluster: '@@ENV:OKE_DOMAIN_NAME@@-cluster': WeblogicPluginEnabled: true DynamicServers: ServerNamePrefix: '@@ENV:OKE_DOMAIN_NAME@@-managed-server' MaxDynamicClusterSize: 9 CalculatedListenPorts: false ServerTemplate: '@@ENV:OKE_DOMAIN_NAME@@-cluster-template' DynamicClusterSize: 9 Server: '@@ENV:OKE_DOMAIN_NAME@@-adminserver': RestartDelaySeconds: 10 GracefulShutdownTimeout: 120 RestartMax: 20 NetworkAccessPoint: T3Channel: PublicPort: 30012 ListenPort: 30012 SSL: OutboundCertificateValidation: BuiltinSSLValidationOnly HostnameVerifier: weblogic.security.utils.SSLWLSWildcardHostnameVerifier InboundCertificateValidation: BuiltinSSLValidationOnly WebServer: HttpsKeepAliveSecs: 310 KeepAliveSecs: 310 ListenPort: 7001 ServerTemplate: '@@ENV:OKE_DOMAIN_NAME@@-cluster-template': ListenPort: 8001 Cluster: '@@ENV:OKE_DOMAIN_NAME@@-cluster' SSL: ListenPort: 8100 OutboundCertificateValidation: BuiltinSSLValidationOnly HostnameVerifier: weblogic.security.utils.SSLWLSWildcardHostnameVerifier InboundCertificateValidation: BuiltinSSLValidationOnly WebServer: HttpsKeepAliveSecs: 310 KeepAliveSecs: 310 NetworkAccessPoint: MyT3Channel: Protocol: 't3' ListenPort: 7999 PublicAddress: '@@ENV:DOMAIN_UID@@-@@ENV:DOMAIN_UID@@-managed-server${id}.@@ENV:NAMESPACE@@' HttpEnabledForThisProtocol: true TunnelingEnabled: true OutboundEnabled: false Enabled: true ClusterAddress: t3://<name of the cluster service>.<domain namespace>:ListenPort TwoWaySSLEnabled: false ClientCertificateEnforced: false
Unrecognized Arguments When Using the Patching Utility Tool
When you run the patching utility tool with some of the documented
arguments, you see the unrecognized arguments
message.
Issue:
patch-utils
with the following
arguments:#List patches
patch-utils list -L
#Download latest patches
patch-utils download -L -p /tmp/<Location to download>
The following message is displayed:
usage: patch-utils <action> [options]
patch-utils: error: unrecognized arguments:
The listed arguments correspond to latest features added to the patching
utility tool for Oracle WebLogic Server for
OKEinstances created after December 14, 2022 (22.4.3). So, if you are using Oracle WebLogic Server for
OKE instances created
before release December 14, 2022, you see the unrecognized
arguments
message.
Workaround:
Run patch-utils upgrade
to upgrade the patching tool, if
you are using the latest features of the patching utility tool for your existing
instances (created before release December 14, 2022). See Upgrade Patching Tool.
Security Checkup Tool Warnings
Learn about the security check warnings that are displayed in the Oracle WebLogic Server Administration console and how to troubleshoot them.
At the top of the WebLogic Server Administration console, the message
Security warnings detected. Click here to view the report and recommended
remedies
is displayed for Oracle WebLogic Server for
OKE instances created after July 20, 2021, or the instances on which
the July 2021 PSUs are applied.
When you click the message, a list of security warnings are displayed as listed in the following table.
The warning messages listed in the table are examples.
Security Warnings
Warning Message | Resolution |
---|---|
|
Configure the identity and trust keystores for each server and the name of the certificate in the identity keystore that the server uses for SSL communication. See Configure Keystore Attributes for Identity and Trust. Note: This warning is displayed for Oracle WebLogic Server for OKE instances created after October 20, 2021, or the instances on which the October PSUs are applied. |
|
Review your applications before you make any changes to address these SSL host name security warnings. For applications that connect to SSL endpoints with a host name in the certificate, which does not match the local machine's host name, the connection fails if you configure the BEA host name verifier in Oracle WebLogic Server. For applications that connect to Oracle provided
endpoints such as Oracle Identity Cloud Service (for
example, You see the SSL host name verification warnings in case of existing Oracle WebLogic Server for OKE instances (created before July 20, 2021). To address this warning, you must configure SSL with host name verifier. See Configure SSL with host name verifier. |
|
Run the following command in the administration
server as
|
|
Set the java properties for anonymous RMI T3 and IIOP requests during server start up. See Set the Java Properties. |
After you address the warnings, you must click Refresh Warnings to see the warnings removed in the console.
For Oracle WebLogic Server for OKE instances created after July 20, 2021, though the java properties to disable anonymous requests for preventing anonymous RMI access are configured, the warnings still appear. This is a known issue in Oracle WebLogic Server.
Set the Java Properties
-
Edit the
domain.yaml
located in/u01/shared/weblogic-domains/<domain_name>/domain.yaml
for all instances ofserverPod
definitions as follows:serverPod: env: - name: USER_MEM_ARGS #admin server memory is explicitly set to min of 256m and max of 512m and GC algo is G1GC value: "-Xms256m -Xmx512m -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom" - name: JAVA_OPTIONS value: "-Dweblogic.store.file.LockEnabled=false -Dweblogic.rjvm.allowUnknownHost=true -Dweblogic.security.remoteAnonymousRMIT3Enabled=false -Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
-
Apply the
domain.yaml
using thekubectl
command:kubectl -f <path_to_domain.yaml>
Configure Keystore Attributes for Identity and Trust
To configure the identity and trust keystore files and the name of the certificate in the identity keystore in the WebLogic Server Administration console:
-
Locate the Change Center and click Lock & Edit to lock the editable configuration hierarchy for the domain.
-
Under Domain structure, select Environment and then select Servers.
-
In the Servers table, select the server you want to configure.
-
On the Configuration tab, click Keystores, and then click Change.
-
Select Custom Identity and Custom Trust, and then click Save.
-
Under Identity, provide the following details:
-
Enter the full path of your identity keystore.
For example:
/u01/data/keystores/identity.jks
-
For Custom Identity Keystore Type, enter JKS.
-
For Custom Identity Keystore Passphrase, enter your keystore password. Enter the same value for Confirm Custom Identity Keystore Passphrase.
-
-
Under Trust, provide the following details:
-
Enter the full path of your identity keystore.
For example,
/u01/data/keystores/trust.jks
-
For Custom Trust Keystore Type, enter JKS.
-
For Custom Trust Keystore Passphrase, enter your keystore password. Enter the same value for Confirm Custom Trust Keystore Passphrase.
-
-
Click Save.
-
Click the SSL tab.
-
Under Identity, provide the following details:
-
For Private Key Alias, enter the name of the certificate (private key) in the identitykeystore, server_cert.
-
For Private Key Passphrase, enter the password for this certificate in the keystore. Enter the same value for Confirm Private Key Passphrase.
By default, the password for the certificate is the same as the identity keystore password.
-
-
Click Save.
After saving the changes, return to Change Center and click Activate Changes.
-
Repeat steps 3 to 9 to configure each server in the domain.