D Troubleshooting

The following appendix describes techniques for troubleshooting Oracle Virtual Assembly Builder, and identifies troubleshooting items:

D.1 General Issues

This section describes troubleshooting for general issues.

D.1.1 Error Indicating Another Client is Running

If you see this error when attempting to launch abctl or abstudio.sh and you are sure there is no other Oracle Virtual Assembly Builder client running, your machine may have run out of file locks.

You may also see such an error when performing concurrent operations through Oracle Virtual Assembly Builder Studio, or abctl. To resolve this problem, check whether there is an environmental issue causing locks to release slowly.

D.1.2 Phone Home Timeouts

To troubleshoot phone home timeouts:

  1. Make sure the firewall on your Oracle Virtual Assembly Builder host is turned off.

    As root run the following:

    $/sbin/service iptables status
    $/sbin/service iptables stop
    

    The firewall may be configured to start automatically when your machine reboots. To prevent this you can turn it off completely by running:

    $/sbin/chkconfig iptables off
    
  2. The default phone home timeout is 15 minutes. You can increase the timeout in <Deployer WLS domain root>/ab_instance/config/deployer/deployer.properties.

    If the file does not exist, create it and add a line similar to phoneHomeTimeout=<number of seconds>.

D.1.3 Existing Assembly Archive (OVA file) Prevents Altering of Assembly

If you have built an assembly archive for your assembly then that assembly is locked - meaning you cannot alter its shape, properties, packages or templates until you delete the archive. The archive can be deleted using abctl delete -archiveOnly -name <assembly name>.

D.2 Introspection and File Set Capture Failures

This section describes troubleshooting for introspection and file set capture issues.

D.2.1 Introspection Fails with Root Cause 'java.net.ConnectException: Connection timed out'

If you see 'java.net.ConnectException: Connection timed out' as the root cause of an introspection failure you may need to set proxy information in your environment before launching introspection. Enter the settings similar to the following, but appropriate for your shell type:

setenv SYSPROPS ' -Dhttp.proxyHost=www-proxy.example.com -Dhttp.proxyPort=80 '

After setting this in your environment proceed to launch the Oracle Virtual Assembly Builder Studio graphical user interface, or run your abctl command.

D.2.2 Introspection of a VM

Introspection of a deployed virtual machine (VM) can fail to complete when the temporary directory used for introspection is a mounted NFS share with NFS file locking. To avoid this problem, remount the NFS file system with locking turned off:

mount -o nolock example:/scratch /net/example/scratch

D.2.3 Remote Operation Failures

The logs for remote operations are copied at the end of a remote operation into the local directory of $AB_INSTANCE/logs/remote_<remote machine name>/. For example, if the remote machine is abc12345, the logs are stored in $AB_INSTANCE/logs/remote_abc12345.example.com/

The default remote working dir is /tmp/abRemote_<remote username>, but this can be overridden using the -remoteWorkingDir flag.

D.2.3.1 Unable to Connect Errors When Running ipv6 on the Remote Machine

If you are unable to perform remote introspection/packaging and get 'unable to connect' errors, one possibility is that you are running ipv6 on the remote machine but the sshd_config file is incorrect.

When remote systems are configured to use ipv6 and ipv4, you must have the following line in the sshd_config file:

AddressFamily any

(and not AddressFamily inet).

D.2.3.2 Remote Operation Hangs after Entering Password

If your remote operation hangs after you enter the remote password it may be due to an orphaned remote process left over from a previous remote operation that was killed. If the remote working directory is removed out from underneath the remote process this can happen. Go to the remote machine and kill any orphaned remote processes, then clean up the remote working dir and try your remote operation again.

The remote process appears similar to the following:

aime1    11662     1  0 11:51 ?        00:00:01 /tmp/user/abRemote_aime1/ab_home/jre/jre/bin/java 
  -Doracle.core.ojdl.logging.config.file=/tmp/user/abRemote_aime1/ab_instance/config/logging.xml 
  -Djava.util.logging.config.class=oracle.core.ojdl.logging.LoggingConfiguration 
  -Djava.security.egd=file:/dev/./your -Dassemblybuilder.spif.app=apps/remotingapp 
  -jar /tmp/kaw/abRemote_aime1/ab_home/jlib/oracle.as.assemblybuilder.spif_0.1.0.jar

D.2.3.3 File Permission Problems

Make sure the remote user or sudo user you specify for the remote operation has read permissions for the files in the reference installation.

D.2.3.4 Remote Connection Failure

SSH port forwarding must be enabled on reference systems in order for remote operations (introspection and file set creation) to work properly. Check the ssh config files:

~/.ssh/config
/etc/ssh/ssh_config

The following error can be encountered if the shell of the remoteUser specified prints things to stdout/stderr during login.

Error: Error initializing the remote connection.
Caused by: OAB-90061: Unable to create connection to remote server.
Cause: Timed out trying to connect to IPV4 and IPV6 sockets.

Check the profile and rc files for the remote user and take out any logic that does this. Alternatively, you can specify a different remote user and use the sudoUser parameter to specify a user that has permission to examine/capture the reference installation.

D.2.3.5 Remote File Set Capture Failure

The remote working directory must have enough disk space available to store your file sets before they are transferred back to the local machine.

D.3 Template Creation Failures

When a template creation operation fails, check the following:

  • Verify you ran $ORACLE_HOME/oracleRoot.sh as root during or after installation.

  • Verify that the ova utility is installed.

  • Verify that you have a valid base image (System.img) and vm.cfg file. Verify that file permissions are correct.

  • Verify that you did not run out of disk space.

  • Verify that you have a sufficient number of loop devices for the file sets you are capturing. See Section D.3.1, "Insufficient Number of Loop Devices".

D.3.1 Insufficient Number of Loop Devices

You may run into an issue where the number of Linux loop devices on an Oracle Virtual Assembly Builder host are not sufficient to create templates for a generic product with a large number of file sets.

When creating templates, Oracle Virtual Assembly Builder and modifyjeos require one available Linux loop device for each disk in the template (that is, one each for the System.img and AB.img, and one per product disk). A typical Oracle Linux system has only seven loop devices, meaning templates can be created for a template with a maximum of five file sets.

To create templates for an appliance with more file sets, you must create additional loop devices. One way to do this is as follows:

  1. Edit /etc/modprobe.conf. Add a line similar to the following:

    options loop max_loop=<n> 
    

    Where <n> is the number of loop devices you want created.

  2. As root, run the following commands to unload and reload the Linux kernel loop module:

    # /sbin/modprobe -r loop 
    # /sbin/modprobe -v loop 
    
  3. Verify that the new loop devices were created:

    $ ls -l /dev/loop* 
    

    You should see <n> loop devices.

D.4 Deployer Communication Failures

This section describes troubleshooting of Deployer communication failures.

D.4.1 Invalid Deployer Response Returned

You may run into the following error:

Caused by: Invalid deployer response returned.
  Cause: OAB-113409 - An invalid response was returned by the deployer.
  Action: OAB-113409 - Please check the deployer log for additional details.

Check the Deployments for the WebLogic Server instance hosting the Deployer and make sure the state of the Deployer application is 'Active' and its health is 'OK'.

D.4.2 401/403 Errors from the Deployer

If you run into 401/403 errors when trying to interact with the Deployer, check the following items in the WebLogic Server Administration Console of the server instance hosting the Deployer.

  1. Go to Deployments > Deployer and verify that the Security Model is "CustomRoles."

  2. Go to Deployments > Deployer > [Security] > [Roles] > ApplicationAdmin and make sure that conditions include Group: CloudAdmins or ApplicationAdmins.

  3. Go to Deployments > Deployer > [Security] > [Roles] > CloudAdmin and ensure that the conditions include Group: CloudAdmins.

  4. Go to Security Realms > myrealm > [Users and Groups] > [Groups] and ensure that "ApplicationAdmins" and "CloudAdmins" exist and are handled by DefaultAuthenticator.

  5. Go to Security Realms > myrealm > [Users and Groups] > [Users] and make sure that "applicationAdmin" and "cloudAdmin" exist and are handled by DefaultAuthenticator.

  6. Go to Deployments > Deployer > [Security] > [URL Patterns] and make sure there is no role definition on root url-pattern.

  7. When creating a connection in the client, make sure the username is one of the two listed in step 5 and is specified with the password for that user.

D.5 Registration Failures

If your registration has become unresponsive, check the following:

  • If you have never had a successful registration, try to register a template using the Oracle VM console directly - bypassing Oracle Virtual Assembly Builder. If this is not successful then the problem is with your Oracle VM environment.

  • If you have a very large assembly archive or a slow network you may need to increase the 'Stuck Thread Max Time' setting for both Oracle WebLogic Server where the Deployer is running, and Oracle WebLogic Server running Oracle VM Manager. Access this setting using the WebLogic Server Administration Console, in the Tuning tab for the server instance.

D.6 Deployment Failures

This section describes troubleshooting of deployment failures.

If you cannot determine the cause of the failure from the Studio or Deployer logs you'll have to continue investigating. Log in to the Oracle VM Manager console and see if the VMs for your assembly were created and started.

D.6.1 VM Not Created

Check the Deployer and the Oracle VM and Oracle VM Server logs for an indication of why the VMs were not created.

Note:

It is possible for the Deployer's state cache to get out of sync from the Oracle VM environment, especially if cleanup type activity was done in the Oracle VM environment outside of Oracle Virtual Assembly Builder. The Deployer may have recorded that an assembly is still registered or deployed, when it has actually been removed from the Oracle VM environment. If this has happened, you must unregister and undeploy your archive through Oracle Virtual Assembly Builder and re-register and redeploy it.

D.6.2 VM Created, But Not Running

Perform the following steps if the VM is created, but not running:

  1. Check the Deployer, and Oracle VM and Oracle VM Server logs, for an indication of why the VMs were not started.

  2. Try starting the VM manually and see if any useful output is given.

    1. Log in to the Oracle VM Server machine that created the failed VM (you can see which machine in the pool owns the VM via the Oracle VM Manager console)

    2. Find the vm.cfg for the VM in question - it will be in some location under /OVS/Repositories, and will have the ID from the Oracle VM console in its path.

    3. Use the xm create -f <vm.cfg file> command to start the VM.

  3. Try mounting the disk images and see if any logs were created. (This can be the case if the VM came up but then went back down for some reason). An Oracle Enterprise Linux image is composed of multiple disks; you must mount the disk you are interested in: such as, AB.img, System.img, or Product_001.img.

    1. The Oracle Virtual Assembly Builder logs are on the AB.img disk - but the name of that image changes once it is registered in the Oracle VM environment. To find the location of the newly named image you need to find the vm.cfg. It will be in some location under /OVS/Repositories, and have the VM ID from the Oracle VM console in its path.

    2. Figure out the loop device using the path to the image file you just found.

      #kpartx -a <path to img file>
      #kpartx -l <path to img file>
      
    3. From the listing of the previous command, you can determine which loop device has been mapped to the disk. Mount the disk and specify the loop device.

      #mount /dev/mapper/loop?p? /mnt
      

      The first question mark ('?') above represents the number of the loop device, which is usually zero but may be a different number, and the second question mark is the partition number, which is usually zero but may be a different number.

    4. Go to /mnt and look at the files. The AB.img may have logs in /mnt/logs, if reconfiguration got far enough to create them.

    5. Enter:

      #umount /dev/mapper/loop?p?
      #kpartx -d AB.img
      

D.6.3 VM Created and Running But Cannot be Pinged

The network configuration for the machine did not complete successfully for some reason. If you are using DHCP make sure your Oracle VM environment supports it. If using static IP addresses make sure you have also specified the corresponding hostname in your deployment plan - this is required.

You must specify all of the following network related properties in your deployment plan:

  • at the assembly level

    • network_name (needs to match the name of a network in your target (Oracle VM) environment)

  • in the network properties for each appliance

    • hostname (if using static IPs)

    • default-gateway

    • dns-domains (only one is supported)

    • dns-servers (only one is supported)

  • on each network interface (NIC) for each appliance

    • ip_address (if using static IPs)

    • netmask

    • usedhcp (should be false if using static IPs)

D.6.3.1 How to Access a Running VM that Cannot be Pinged

To access the VM:

  1. Log in to the Oracle VM Server machine that created the failed VM (you can see which machine in the pool owns the VM via the Oracle VM Manager console.

  2. Run the command xm list.

  3. Find your VM in the list returned - look on the Oracle VM Manager console for the VM ID.

  4. Run the command xm console <vm ID> and then hit enter and provide credentials (user: root, password: the password supplied during template creation).

D.6.3.2 Triaging a Network Configuration Failure

To triage the failure:

  1. Check the logs under /assemblybuilder/logs. If there are no logs proceed to the next step.

  2. Check to see if the ab service was installed. The ab service is installed to /etc/init.d/ab. If it is not there, look at the oraclevm-template service log: /var/log/oraclevm-template. The oraclevm-template service installs the ab service.

    If the ab service is missing, make sure the permissions on the ab_service.sh and oraclevm-template.sh in your ORACLE_HOME are correct. These files should be executable. If they are not: fix the permissions, recreate your assembly archive, upload, register and try the deployment again.

  3. If you believe the late bindings are incorrect or were not sent to the VM you can run the command /assemblybuilder/etc/vmapi get +.

    This command will output the late bindings.

D.6.3.3 VM is Created, Started and Can be Pinged

If the VM is up and pingable, then the network configuration for the VM completed successfully.

  1. Log in to the failed VM and check the logs under /assemblybuilder/logs. See the Logs section below for details on what is in the various log files. You should be able to ssh to the machine using the root user and the password you specified when creating the templates.

  2. If you believe the late bindings are incorrect or were not sent to the VM you can run the command /assemblybuilder/etc/vmapi get +.

    This command will output the late bindings.

D.7 Log Locations and Descriptions

This section provides log locations and descriptions. There are several different logs and log locations that are useful to know about when triaging failures.

D.7.1 Studio Logs

The log level for the Studio log can be altered by editing $AB_INSTANCE/config/logging.xml. Change the following line by indicating the desired level:

<logger name="oracle.as.assemblybuilder" level="FINE">

Local Logs

$AB_INSTANCE/logs/assemblybuilder.log

$AB_INSTANCE/logs/bottler/* - output from the modifyjeos tool used during template creation

Remote Logs

The logs for remote operations are copied at the end of a remote operation into the local directory of $AB_INSTANCE/logs/remote_<remote machine name>/. For example, if the remote machine is abc12345, the logs are stored in $AB_INSTANCE/logs/remote_abc12345.example.com/

The default remote working dir is /tmp/abRemote_<remote username>, but this can be overridden using the -remoteWorkingDir flag.

D.7.2 Deployer Logs

The Deployer application log messages will be in the server and/or domain logs for the WebLogic Server instance where the Deployer application is deployed. Stdout/stderr for the WebLogic Server instance may also contain relevant information or stack traces.

<Deployer WLS domain root>/servers/<server targeted by Deployer app>/logs/*

D.7.3 Oracle VM Logs

Oracle VM

/u01/app/oracle/ovm-manager-3/machine1/base_adf domain/servers/AdminServer/logs/AdminServer.log

Oracle VM Server

/var/log/ovs-agent.log

D.7.4 Logs on the VM Instance

/assemblybuilder/logs/ab.out - stdout/stderr and progress messages from Oracle Virtual Assembly Builder infrastructure code and plug-in code

/assemblybuilder/logs/assemblybuilder.log - log messages from Oracle Virtual Assembly Builder infrastructure code and plug-in code

/assemblybuilder/logs/command.out - environment and command details for commands launched via the RehydrateUtils.runCommand() or runCommandAs() methods

/assemblybuilder/logs/proc.<unique>.log - stdout/stderr from processes launched via RehydrateUtils.runCommand() or runCommandAs() where the daemon flag passed in was true