When migrating running virtual machines that use local storage, the virtual disks are migrated first from the source server's local repository to the destination server's local repository. In certain cases, during the storage live migration, a suspend error may occur for the source Oracle VM Server guest virtual machine when there is additional disk activity. For example, additional disk activity may involve a process that is running on the guest virtual machine that is writing to the virtual disk on the source server's local repository while it is being migrated to the destination server's local repository.
The suspend operation requires more time to complete due to the
additional disk activity, resulting in a suspend error that causes
the storage live migration to fail. A suspend error similar to the
following is displayed in the
/var/log/xen/xend.log
on the source Oracle VM Server
instance:
[timestamp
2792] DEBUG (XendDomainInfo:3509) waitForSuspend: domain 1 state: [timestamp
2792] INFO (XendCheckpoint:430) [9836] xc: error: Suspend request failed: Internal error [timestamp
2792] DEBUG (XendDomainInfo:1943) XendDomainInfo.handleShutdownWatch [timestamp
2792] INFO (XendCheckpoint:430) [9836] xc: error: Domain appears not to have suspended: Internal error
A timeout error is also displayed in the kernel log on the source Oracle VM Server guest virtual machine:
timestamp
server
kernel: [ 1601.227167] Freezing of tasks failed afterxx
seconds (0 tasks refusing to freeze, wq_busy=1):
Workaround: Update the
/sys/power/
pm_freeze_timeout
parameter, which is specified in milliseconds on the source
Oracle VM Server guest virtual machine to ensure adequate time is
available when additional disk activity occurs during storage live
migration:
Log in to the guest virtual machine.
It is strongly recommended to update the
pm_freeze_timeout
parameter to the maximum downtime that you can afford, up to a maximum of 5 minutes, or 300000 milliseconds. For example, to update the/sys/power/
pm_freeze_timeout
parameter to 5 minutes, run the following command on the guest virtual machine:# echo "
300000
" > /sys/power/pm_freeze_timeoutNoteThe value you set for the
pm_freeze_timeout
parameter impacts the amount of time the source Oracle VM Server guest virtual machine remains offline during storage live migration.Perform the live migration with storage again and no suspend errors should occur.
If a suspend error still occurs for the source Oracle VM Server guest
virtual machine after the pm_freeze_timeout
parameter is updated, you may need to avoid storage live migration
until the additional disk activity completes, or reconsider the
maximum time you can afford for the guest virtual machine to
remain offline during storage live migration.
Bug 26289880