After moving from one datacenter to another we started experiencing issues live migrating virtual machines from one host to another in our 2 node failover cluster. The migration would instantly fail, and there would be no error other than:
Live migration of 'Virtual Machine VMNAME' failed.
If I did a quick migration it works, but live did not. I started looking at the security logs of the hosts and noticed some intermittent errors:
An account failed to log on.
Subject:
Security ID: SYSTEM
Account Name: HYPERVHOSTCOMPUTER$
Account Domain: OURDOMAIN
Logon ID: 0x3E7
Logon Type: 8
Account For Which Logon Failed:
Security ID: NULL SID
Account Name: HYPERVHOSTCOMPUTER
Account Domain: OURDOMAIN.com
Failure Information:
Failure Reason: Unknown user name or bad password.
Status: 0xC000006D
Sub Status: 0xC000006A
Process Information:
Caller Process ID: 0xd30
Caller Process Name: C:\Windows\Cluster\rhs.exe
Network Information:
Workstation Name: HYPERVHOSTCOMPUTER
Source Network Address: -
Source Port: -
Detailed Authentication Information:
Logon Process: Advapi
Authentication Package: Negotiate
Transited Services: -
Package Name (NTLM only): -
Key Length: 0
This event is generated when a logon request fails. It is generated on the computer where access was attempted.
The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.
The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network).
The Process Information fields indicate which account and process on the system requested the logon.
The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.
The authentication information fields provide detailed information about this specific logon request.
- Transited services indicate which intermediate services have participated in this logon request.
- Package name indicates which sub-protocol was used among the NTLM protocols.
- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.
Then I noticed errors in the cluster itself, at the same times:
Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason:
The handle is invalid.
.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.
I looked at a domain controller and noticed a lot of Audit Failures for that computer object. I opened the computer object in ADSI Edit, and noticed that the last login was 11/23 (the day we moved), and the last password reset was 11/24, which is incredibly odd. The last bad login attempt was a few minutes ago. I'm not sure how, but I think a password reset may have been attempted while the domain controllers were unavailable.
How I fixed it:
- Open Failover Cluster Manager
- Navigate to Cluster Core Resource
- Right click on the cluster network name and take it offline
- Right click on the cluster name and navigate to more actions -> repair
A few seconds later the cluster was repaired, I turned the cluster name back on and live migrations work.
Mystery solved.
HTH!