Live migration of 'Virtual Machine VMNAME' failed.
If I did a quick migration it works, but live did not. I started looking at the security logs of the hosts and noticed some intermittent errors:
An account failed to log on.
Subject:
Security ID: SYSTEM
Account Name: HYPERVHOSTCOMPUTER$
Account Domain: OURDOMAIN
Logon ID: 0x3E7
Logon Type: 8
Account For Which Logon Failed:
Security ID: NULL SID
Account Name: HYPERVHOSTCOMPUTER
Account Domain: OURDOMAIN.com
Failure Information:
Failure Reason: Unknown user name or bad password.
Status: 0xC000006D
Sub Status: 0xC000006A
Process Information:
Caller Process ID: 0xd30
Caller Process Name: C:\Windows\Cluster\rhs.exe
Network Information:
Workstation Name: HYPERVHOSTCOMPUTER
Source Network Address: -
Source Port: -
Detailed Authentication Information:
Logon Process: Advapi
Authentication Package: Negotiate
Transited Services: -
Package Name (NTLM only): -
Key Length: 0
This event is generated when a logon request fails. It is generated on the computer where access was attempted.
The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.
The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).
The Process Information fields indicate which account and process on the system requested the logon.
The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.
The authentication information fields provide detailed information about this specific logon request.
- Transited services indicate which intermediate services have participated in this logon request.
- Package name indicates which sub-protocol was used among the NTLM protocols.
- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.
Then I noticed errors in the cluster itself, at the same times:
Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason:
The handle is invalid.
.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.
How I fixed it:
- Open Failover Cluster Manager
- Navigate to Cluster Core Resource
- Right click on the cluster network name and take it offline
- Right click on the cluster name and navigate to more actions -> repair
A few seconds later the cluster was repaired, I turned the cluster name back on and live migrations work.
Mystery solved.
HTH!
This is pure magic! I spent 2 days troubleshooting this issue. I even destroyed & re-created the cluster.
ReplyDeleteYour fix is brilliant, and just "works"!
Hah, I'm happy I was able to help somebody! I fought this problem off and on (mostly off) for about a week before I decided it was a priority to fix.
DeleteBroke my head for 24 hours over this.
ReplyDeletebig thumbs up to you for posting.
Great post. Fixed in 5 min after reading this.
ReplyDeleteHi Andrew,
ReplyDeleteWere you able to do this while the VMs in the cluster were running?
If I remember correctly... yes.
DeleteWhat happens to the VMs running on the cluster, will they be not accessible until the cluster name is online again?
ReplyDeleteThey are available, but we were unable to perform live migrations from one node to another.
DeleteIt works with the suggestion. Thanks.
ReplyDelete