When migrating running virtual machines that use local storage, the virtual disks are migrated first from the source server's local repository to the destination server's local repository. In certain cases, during the storage live migration, a suspend error may occur for the source Oracle VM Server guest virtual machine when there is additional disk activity. For example, additional disk activity may involve a process that is running on the guest virtual machine that is writing to the virtual disk on the source server's local repository while it is being migrated to the destination server's local repository.
The suspend operation requires more time to complete due to the
additional disk activity, resulting in a suspend error that causes
the storage live migration to fail. A suspend error similar to the
following is displayed in the
/var/log/xen/xend.log on the source Oracle VM Server
instance:
[timestamp2792] DEBUG (XendDomainInfo:3509) waitForSuspend: domain 1 state: [timestamp2792] INFO (XendCheckpoint:430) [9836] xc: error: Suspend request failed: Internal error [timestamp2792] DEBUG (XendDomainInfo:1943) XendDomainInfo.handleShutdownWatch [timestamp2792] INFO (XendCheckpoint:430) [9836] xc: error: Domain appears not to have suspended: Internal error
A timeout error is also displayed in the kernel log on the source Oracle VM Server guest virtual machine:
timestampserverkernel: [ 1601.227167] Freezing of tasks failed afterxxseconds (0 tasks refusing to freeze, wq_busy=1):
Workaround: Update the
/sys/power/pm_freeze_timeout
parameter, which is specified in milliseconds on the source
Oracle VM Server guest virtual machine to ensure adequate time is
available when additional disk activity occurs during storage live
migration:
Log in to the guest virtual machine.
It is strongly recommended to update the
pm_freeze_timeoutparameter to the maximum downtime that you can afford, up to a maximum of 5 minutes, or 300000 milliseconds. For example, to update the/sys/power/pm_freeze_timeoutparameter to 5 minutes, run the following command on the guest virtual machine:# echo "
300000" > /sys/power/pm_freeze_timeoutNoteThe value you set for the
pm_freeze_timeoutparameter impacts the amount of time the source Oracle VM Server guest virtual machine remains offline during storage live migration.Perform the live migration with storage again and no suspend errors should occur.
If a suspend error still occurs for the source Oracle VM Server guest
virtual machine after the pm_freeze_timeout
parameter is updated, you may need to avoid storage live migration
until the additional disk activity completes, or reconsider the
maximum time you can afford for the guest virtual machine to
remain offline during storage live migration.
Bug 26289880

