Troubleshoot and Known Issues in Domain in Image

Following are the common problems in Oracle WebLogic Server for OKE Domain In Image. Learn how to diagnose and solve them.

Free-Tier Autonomous Database

Free-Tier autonomous database is not supported.

RCU Datasources have Targets only to Administration Server

If you are using Domain In Image, for RAC database, then data sources that you create with the Enterprise Edition are targeted to only the administration server. Some of the data sources. like mds-owsm, opss-audit-DBDS, opss-audit-viewDS, opss-data-source need to be targeted to the WLS cluster. You need to update this after provisioning, by using the update-domain pipeline job.

Issue: Some of the data sources are not targeted to the WLS cluster.

Workaround:

Complete the following steps:
  1. Create a model yaml file with the name ee_datasource.yaml and save it to a preferred location.

  2. Open the ee_datasource.yaml file and copy-paste the following resources information:

    Note:

    Replace the administration server and cluster names in the target place holders <adminserver-name> and <cluster-name> respectively.
    resources:
      JDBCSystemResource:
        'db1-mds-owsm':
          Target: '<adminserver-name>, <cluster-name>'
        'db2-mds-owsm':
          Target: '<adminserver-name>, <cluster-name>'
        'db1-opss-audit-DBDS':
          Target: '<adminserver-name>, <cluster-name>'
        'db2-opss-audit-DBDS':
          Target: '<adminserver-name>, <cluster-name>'
        'db1-opss-audit-viewDS':
          Target: '<adminserver-name>, <cluster-name>'
        'db2-opss-audit-viewDS':
          Target: '<adminserver-name>, <cluster-name>'
        'db1-opss-data-source':
          Target: '<adminserver-name>, <cluster-name>'
        'db2-opss-data-source':
          Target: '<adminserver-name>, <cluster-name>'
        'mds-owsm':
          Target: '<adminserver-name>, <cluster-name>'
        'opss-audit-DBDS':
          Target: '<adminserver-name>, <cluster-name>'
        'opss-audit-viewDS':
          Target: '<adminserver-name>, <cluster-name>'
        'opss-data-source':
          Target: '<adminserver-name>, <cluster-name>'
  3. Run the update-domain pipeline job, with the location of the model yaml file.
  4. After the job succeeds, from the administration console verify that the following data sources are targeted to administration server and the WLS cluster.
    • mds-owsm
    • opss-audit-DBDS
    • opss-audit-viewDS
    • opss-data-source

Handling NFS Locking Errors

By default, the WebLogic stores are mount to the shared file system, which use Network File System (NFS) version 3 and is disabled. Therefore, the file locks on the different WebLogic stores and may not release if the VM of any node pool in the WebLogic Node pool is abruptly shut down. This is encountered in different scenarios, like, when a VM is stopped, restarted, or terminated, and there are WebLogic pods assigned to the worker node that is being terminated.

Issue: The WebLogic Server Pod (Admin Server or any manged server) fails to start and displays the following error in the WebLogic logs:
[Store:280105]The persistent file store "_WLS_myinstance-admin-server" cannot open file _WLS_<instanceName>-<ServerName>000000.DAT.

Workaround:

To solve this issue, complete the following steps:

Note:

Even if you are using an earlier version of WebLogic Server you need to complete these steps.
  1. Apply patch 32471832 by using the opatch update job, which is available in July 2021 PSUs.
  2. For administration and managed server pods in the cluster, update the domain.yaml file by adding the Dweblogic.store.file.LockEnabled=false parameter.
    Following is an example, where the Dweblogic.store.file.LockEnabled=false parameter is added:
    serverPod:
      env:
      - name: USER_MEM_ARGS
        #Default to G1GC algo
        value: "-XX:+UseContainerSupport -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom"
      - name: JAVA_OPTIONS
        value: "-Dweblogic.store.file.LockEnabled=false -Dweblogic.rjvm.allowUnknownHost=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.security.remoteAnonymousRMIT3Enabled=false -Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
    
  3. Run the following command to apply domain.yaml.
    kubectl -f <domain.yaml-file-path>

Note:

If you have created Oracle WebLogic Server for OKE instances created after July 20, 2021, or the instances on which the July 2021 PSUs are applied, a few Security warnings are displayed. See About the Security Checkup Tool.

Unable to Access the Console or the Application

Troubleshoot problems accessing the console or the application after the Oracle WebLogic Server for OKE domain is successfully created.

Error accessing the console or the application

If you receive 502 bad gateway error when accessing the Jenkins console and WebLogic Server console, or the application using load balancer, use the kubectl command to get the node ports that are used by the system and ensure that these node ports are open for access via the load balancer subnet.

For example:
kubectl describe service --all-namespaces | grep -i nodeport
NodePort: http 32062/TCP
NodePort: https 30305/TCP

To check port access:

  1. Access the Oracle Cloud Infrastructure console.

  2. From the navigation menu, select Networking, and then click Virtual Cloud Networks.

  3. Select the compartment in which you created the domain.

  4. Select the virtual cloud network in which the domain was created.

  5. Select the subnet where the WebLogic Server compute instance is provisioned.

  6. Select the security list assigned to this subnet.

  7. For an Oracle WebLogic Server for OKE cluster using a private and public subnet, make sure the following ingress rules exist:

Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 32062
Source: <LB Subnet CIDR>
IP Protocol: TCP
Source Port Range: All
Destination Port Range: 30305

For a domain on a private and public subnet, set the Source to the CIDR of the load balancer subnet.

Stack Creation Failed

Troubleshoot a failed Oracle WebLogic Server domain that you created using Oracle WebLogic Server for OKE.

Failed to install WebLogic Operator

Stack provisioning might fail when you create a domain with Oracle WebLogic Server for OKE in a new subnet for an existing VCN due to error in installation of WebLogic Server Kubernetes Operator.

Example message:
module.provisioner.null_resource.check_provisioning_status_1  (remote-exec):
<Aug 27, 2020 07:01:31 PM GMT> <INFO>  <install_wls_operator.sh>
<(host:wrjrf8-admin.wrjrf8admin.existingnetwork.oraclevcn.com) -  <WLSOKE-VM-INFO-0020> :
Installing weblogic operator in namespace [wrjrf8-operator-ns]>
module.provisioner.null_resource.check_provisioning_status_1  (remote-exec): <Aug 27, 2020
07:02:12 PM GMT> <ERROR>  <install_wls_operator.sh>
<(host:wrjrf8-admin.wrjrf8admin.existingnetwork.oraclevcn.com) -  <WLSOKE-VM-ERROR-0013> : Error
installing weblogic operator. Exit  code[1]>

Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.

Failed to create service account

Stack provisioning might fail with HTTP 409 conflict error if the service account creation fails.

Example message:
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":
"Operation cannot be fulfilled on serviceaccounts \"default\": the object has been modified;
please apply your changes to the latest version and try again","reason":"Conflict","details":
{"name":"default","kind":"serviceaccounts"}

,"code":409}

Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.

Failed to login to OCIR

Stack provisioning might fail if the docker login to OCI registry is not succcesful.

Example message:
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.sh> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0003> : Unable to login to custom OCIR
[phx.ocir.io]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
]>module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 02:33:46 PM GMT> <ERROR> <docker_init.py> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com)
- <WLSOKE-VM-ERROR-0020> : Error executing sh /u01/scripts/bootstrap/docker_init.sh. Exit code [1]>

Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.

Failed to download ATP wallet

Stack provisioning might fail if you create a JRF-enabled domain running WebLogic Server 12c and using an Oracle Autonomous Database.

Example message in apply log:
module.provisioner.null_resource.check_provisioning_status_1 (remote-exec):
<Sep 22, 2020 12:31:11 PM GMT> <ERROR> <markers.py> <(host:wrfinal2-admin.admin.existingnetwork.oraclevcn.com
- <Sep 22, 2020 12:31:11> - <WLS-OKE-ERROR-003> - Failed to verify oke cluster nodes status.
[Exit code : {'status': 500, 'message': u'An internal server error has occurred.',
'code': u'InternalServerError', 'opc-request-id':
'768603269A9D460D9B979632FC04C181/37A72EDA76A2687A5E24499AA6A70F9B/7823A7DF9CDD435D869F3CB42C46B39E'}]>

Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.

Failed to verify OKE cluster node status

Stack provisioning fails if the OKE cluster worker nodes are inactive when you create the WebLogic domain with Oracle WebLogic Server for OKE.

Example message:
<INFO> <oke_worker_status.py>
<(host:wlsatpte-admin.nevcnokeadmin.nevcnokevcn.oraclevcn.com) - <WLSOKE-VM-INFO-0011> : Waiting
for the workers nodes to be Active. Retrying...><Dec 17, 2020 04:47:56 PM GMT> <ERROR>
<markers.py> <(host:wlsatpte-admin.nevcnokeadmin.nevcnokevcn.oraclevcn.com) - <Dec 17, 2020
16:47:56> - <WLS-OKE-ERROR-003> - Failed to verify oke cluster nodes status. [Exit code : Status
check timed out]>

Run a Destroy job on the stack and apply the job again to recreate the resources using the same database.

Load Balancer Creation Failed

After provisioning a stack, you might encounter an issue where the internal Load Balancer (LB) is missing.

When you run the following command, the external IP for the LB would is displayed as <pending>:
kubectl get svc -n ingress-nginx

In this case, the IP allocation for the LB fails and the LB instance is not created. This is because the quota for the selected LB shape is not available.

Previous Domain Image in Sample Application

You may encounter this issue in the WebLogic Server Console for a sample application Jenkins job.

After the sample application is successfully created in Jenkins using the sample-app job, on the WebLogic Server console, the sample application is not deployed with the new domain image, and still shows the previous domain image.

Example message:
10:15:10 + echo 'Publishing image [iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12] to domain...'
10:15:10 Publishing image [iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12] to domain...
10:15:10 + local running_domain_yaml=/tmp/running-domain-20-09-11_16-59-12.yaml
10:15:10 + kubectl get domain rsht1domain -n rsht1-domain-ns -o yaml
10:15:10 + mkdir -p /u01/shared/weblogic-domains/rsht1domain/backups/20-09-11_16-59-12
10:15:10 + cp /tmp/running-domain-20-09-11_16-59-12.yaml \
/u01/shared/weblogic-domains/rsht1domain/backups/20-09-11_16-59-12/prev-domain.yaml
10:15:10 + sed -i -e 's|\(image: \).*|\1 "iad.ocir.io/ax8cfrmecktw/rsht1/rsht1_domain/wls-domain-base: \
12.2.1.4.200714-200819-20-09-11_16-59-12"|g' /tmp/running-domain-20-09-11_16-59-12.yaml
10:15:10 + kubectl apply -f /tmp/running-domain-20-09-11_16-59-12.yaml
10:15:21 Error from server (Conflict): error when applying patch:
10:15:21 {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration": \
"{\"apiVersion\":\"weblogic.oracle/v8\",\"kind\":\"Domain\"

As a workaround, run the sample application again using the sample-app Pipeline job.

Reinstall Load Balancers for Jenkins

When provisioning, creating a load balancer can fail for different reasons. However, provisioning does not stop as the load balancers can be created later. Follow the steps in this section to recreate the load balancers.

When you create a Oracle WebLogic Server for OKE instance, two load balancers are created. One with a public IP, that provides access to the applications installed in the WebLogic cluster, and another with a private IP, to provide access to the WebLogic console and Jenkins console.

Following are the reasons load balancer creation fails during provisioning:
  1. Lack of quota for the selected LB shapes.
  2. Lack of available public IPs (for external load balancer) or private IPs (for internal load balancer) in the VCN or subnets selected during provisioning.

Check the Status of the Load Balancers

You can view the status of the load balancers by checking the Resource Manager job log, the load balancer services, and the provisioning logs.

Resource Manager job log: When both the load balancers are created successfully, the resource manager job log includes the following:
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): {
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "weblogic_console_url": "http://<IP_adressess>/console",
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "jenkins_console_url": "http://<IP_adressess>/jenkins",
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): "weblogic_cluster_lb_url": "https://<IP_adressess>/<application context>"
module.provisioner.null_resource.check_provisioning_status_3 (remote-exec): }

The first two lines are for the private load balancer. The third line is for the public load balancer. If any of the load balancers are not created, you will not see any of the above lines in the Resource Manager job log.

Load Balancer Services:

To check the load balancer services, run the following command:
kubectl get svc -n ingress-nginx

If the output lists any of the load balancer services as <pending>, under the EXTERNAL-IP column, then the load balancers are not created.

Sample output:
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)         AGE
rsh3oke-external   LoadBalancer   10.1.1.1    <pending>         443:30618/TCP   11m
rsh3oke-internal   LoadBalancer   10.1.1.2   100.1.1.1   80:30790/TCP    11m

Provisioning logs:

If the internal or external load balancer is not created successfully, the /u01/logs/provisioning.log file would include an error message.

Sample of the error message:
<WLSOKE-VM-INFO-0058> : Installing ingress controller charts for jenkins [ ingress-controller ]>
<WLSOKE-VM-ERROR-0058> : Error installing ingress controller with Helm. Exit code [1]>
And, in the /u01/logs/provisioning_cmd.out file, you would see the following error message:
<install_ingress_controller.sh>  -  Error: timed out waiting for the condition

Reinstall the Load Balancers

After identifying and fixing the cause of the failure, like increased quota for the selected LB shape, you can reinstall the load balancers in the instance.

  1. Run the following command to get the values required to install the ingress-controller:
    helm get values ingress-controller -o yaml > ingress_values.yaml
  2. Run the following command to remove the existing helm release:
    helm uninstall ingress-controller

    Note:

    This command will delete both the external and internal load balancers.
  3. Run the following command to install both the external and internal load balancers:
    /u01/scripts/bootstrap/install_ingress_controller.sh ingress_values.yaml
    Sample Output:
    <Nov 20, 2020 08:01:01 PM GMT> <INFO> <install_ingress_controller.sh> <(host:host_name) - <WLSOKE-VM-INFO-0058> : Installing ingress controller charts for jenkins [ ingress-controller ]>
    <Nov 20, 2020 08:03:27 PM GMT> <INFO> <install_ingress_controller.sh> <(host:host_name) - <WLSOKE-VM-INFO-0059> : Successfully installed ingress controller>
    
  4. Run the following command to verify if load balancer services are created and have external IP addresses:
    kubectl get svc -n ingress-nginx
    Sample output:
    NAME               TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)         AGE
    rsh3oke-external   LoadBalancer   10.96.115.139   144.25.19.38      443:31162/TCP   12m
    rsh3oke-internal   LoadBalancer   10.96.249.188   100.111.191.133   80:30605/TCP    12m

Install Jenkins Manually

When you create a Oracle WebLogic Server for OKE instance, Jenkins is installed by installing a Helm release called jenkins-oke. During provisioning, Jenkins installation may fail, but provisioning is not stopped, because Jenkins can be installed after provisioning. This section explains how to install Jenkins manually, if Jenkins installation has failed during provisioning.

Check if Jenkins Install Failed during Provisioning

You can know if the Jenkins install failed by trying to access the Jenkins console, checking the provisioning logs, and checking the Kubernetes resources (pods, services, and so on) under the jenkins-ns namespace.

Access the Jenkins console:

Try accessing the Jenkins console, as described in Access the Jenkins Console.

If you are not able to access the console, then continue to the next section to check the logs.

Provisioning logs:

If Jenkins is not installed successfully, then the /u01/logs/provisioning.log file would include an error message.

Sample of the error:
<WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns>
<WLSOKE-VM-ERROR-0052> : Error installing jenkins charts. Exit code[1]>

And, you would see the details of the failure in the /u01/logs/provisioning_cmd.out file.

Kubernetes resources:

To check the Kubernetes resources in the jenkins-ns namespace, run the following command:
kubectl get all -n jenkins-ns
Following is a sample output, where Jenkins was installed correctly:
NAME                                      READY   STATUS    RESTARTS   AGE
pod/jenkins-deployment-5bb55586b9-vn8sk   1/1     Running   0          26m
 
NAME                      TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)              AGE
service/jenkins-service   ClusterIP   10.96.149.6   <none>        8080/TCP,50000/TCP   26m
 
NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/jenkins-deployment   1/1     1            1           26m
 
NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/jenkins-deployment-5bb55586b9   1         1         1       26m

Install Jenkins Manually

After identifying and fixing the cause of the failure, install Jenkins in your instance.

  1. Check if the provisioning_metadata.properties file exists, at the /u01/shared/weblogic-domains/<domain> directory.
    Does the provisioning_metadata.properties file exist?
    • Yes: Continue with the next step.
    • No: Run the following command:
      python /u01/scripts/metadata/provisioning_metadata.py

      Continue with the next step.

  2. Run the following command to remove the existing helm release:
    helm uninstall jenkins-oke
  3. Run the following command to install Jenkins:
    /u01/scripts/bootstrap/install_jenkins.sh  /u01/provisioning-data/jenkins-inputs.yaml

    Where, jenkins-inputs.yaml file contains the required variables.

    Sample Output:
    <Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.py> <(host:host_name) - updated /u01/provisioning-data/jenkins-inputs.yaml>
    <Nov 23, 2020 05:10:07 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0098> : Creating configmap [wlsoke-metadata-configmap]>
    <Nov 23, 2020 05:10:09 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0056> : Installing jenkins jenkins-ns>
    <Nov 23, 2020 05:10:22 PM GMT> <INFO> <install_jenkins.sh> <(host:host_name) - <WLSOKE-VM-INFO-0057> : Successfully installed jenkins in namespace [ jenkins-ns ]>
    

You have successfully installed the Jenkins console. Try accessing the Jenkins console, as described in Access the Jenkins Console.

Security Checkup Tool Warnings

Learn about the security check warnings that are displayed in the Oracle WebLogic Server Administration console and how to troubleshoot them.

At the top of the WebLogic Server Administration console, the message Security warnings detected. Click here to view the report and recommended remedies is displayed for Oracle WebLogic Server for OKE instances created after July 20, 2021, or the instances on which the July 2021 PSUs are applied.

When you click the message, a list of security warnings are displayed as listed in the following table.

The warning messages listed in the table are examples.

Security Warnings

Warning Message Resolution

SSL hostname verification is disabled by the SSL configuration.

Review your applications before you make any changes to address these SSL host name security warnings.

For applications that connect to SSL endpoints with a host name in the certificate, which does not match the local machine's host name, the connection fails if you configure the BEA host name verifier in Oracle WebLogic Server.

For applications that connect to Oracle provided endpoints such as Oracle Identity Cloud Service (for example,*.identity.oraclecloud.com), the connection fails if you did not configure the wildcard host name verifier or a custom host name verifier that accepts wildcard host names. If you are not sure of the SSL configuration settings you should configure to address the warning, Oracle recommends that you configure the wildcard host name verifier.

You see the SSL host name verification warnings in case of existing Oracle WebLogic Server for OKE instances (created before July 20, 2021). To address this warning, you must configure SSL with host name verifier. See Configure SSL with host name verifier.

Production mode is enabled but the file or directory <directory_name>/startWebLogic.sh is insecure since its permission is not a minimum of umask 027

Run the following command in the administration server as oracle user:

chmod 640 /u01/data/domains/<domain_name>/bin

Remote Anonymous RMI T3 or IIOP requests are enabled. Set the RemoteAnonymousRMIT3Enabled and RemoteAnonymousRMIIIOPEnabled attributes to false.

Set the java properties for anonymous RMI T3 and IIOP requests during server start up. See Set the Java Properties.

After you address the warnings, you must click Refresh Warnings to see the warnings removed in the console.

For Oracle WebLogic Server for OKE instances created after July 20, 2021, though the java properties to disable anonymous requests for preventing anonymous RMI access are configured, the warnings still appear. This is a known issue in Oracle WebLogic Server.

Set the Java Properties

To set the java properties for anonymous RMI T3 and IIOP requests:
  1. Edit the domain.yaml located in /u01/shared/weblogic-domains/<domain_name>/domain.yaml for all instances of serverPod definitions as follows:

    serverPod:
          env:
          - name: USER_MEM_ARGS
            #admin server memory is explicitly set to min of 256m and max of 512m and GC algo is G1GC
            value: "-Xms256m -Xmx512m -XX:+UseG1GC -Djava.security.egd=file:/dev/./urandom"
          - name: JAVA_OPTIONS
            value: "-Dweblogic.store.file.LockEnabled=false 
    		-Dweblogic.rjvm.allowUnknownHost=true 
    		-Dweblogic.security.remoteAnonymousRMIT3Enabled=false 
    		-Dweblogic.security.remoteAnonymousRMIIIOPEnabled=false"
  2. Apply the domain.yaml using the kubectl command:

    kubectl -f <path_to_domain.yaml>

Get Additional Help

Use online help, email, customer support, and other tools if you have questions or problems with Oracle WebLogic Server for OKE.

For general help with Oracle Cloud Marketplace, see How Do I Get Support in Using Oracle Cloud Marketplace.