The setup
Having physical server pA, running VMs using KVM. One of theVMs (vA) acts as an NTP server. pA gets the time from vA and vA gets it from the Internet.
It’s not a great idea to run an NTP server in a VM, but in this case there was need for it.
The problem
NTP server gets frequently out of sync.
If you use nagios, you may get errors like this:
SERVICE ALERT: pA;ntpd;CRITICAL;SOFT;4;NTP CRITICAL: Offset unknown
Both for the physical server and other servers that fetch the time from vA.
The reason
There’s some guessing involved here, but this should be pretty accurate:
VM vA needs to correct its clock every now and then by slowing down or speeding up things per ntpd/adjtimex. As expected, this creates a small discrepancy between vA and pA, as now the physical server gets out of sync and needs to correct its time using vA’s reference time.
Once vA attempts to correct its time, again by slowing down or speeding up its clock, this has a direct effect on vA, as vA’s clock is now affected by pA’s ongoing adjustment. This happens because KVM by default uses kvmclock as its clock source (the source that ticks and not the source that returns the time of the day).
This action sometimes causes pA’s ntpd to get even more out of sync and may even make it consider its peers inaccurate and become fully out of sync.
The problem gets even worse if you have two ntp servers (vA and vB) running on two different physical servers (pA and pB), because the amount of desync between the two is mostly random. Assuming that all your servers, including pA and pB, fetch the time from vA and vB, the discrepancy between them will make them mark at least one of them as wrong, as the stratum of vA and vB does not permit such difference between their clocks.
You can see the above by looking at the falsetick result in ntpq’s associations:
ind assid status conf reach auth condition last_event cnt =========================================================== 1 33082 961a yes yes none sys.peer sys_peer 1 2 33083 911a yes yes none falsetick sys_peer 1
Overall, the problem is that the physical servers will try to fix their clocks, thus affecting the clocks of the NTP servers running in VMs under them.
The solution
The problem is with the VMs using the kvmclock source. You can see that using dmesg:
$ dmesg | grep clocksource Switching to clocksource kvm-clock
The way to disable this is to pass the “no-kvmclock” parameter to the kernel of your VMs. This will not always work though. The reason is that the kernel (at least the CentOS kernels) will panic very early in the boot process as it will still try to initialize the kvmclock even if it’s not going to use it, and will fail.
The solution is to pass two parameters to your VM kernels: “no-kvmclock no-kvmclock-vsyscall”. The second one is a bit undocumented, but will do the trick.
After that you can verify it through dmesg:
$ dmesg | grep Switching Switching to clocksource refined-jiffies Switching to clocksource acpi_pm Switching to clocksource tsc
Example
Below is the output of a server running in such an environment. In this case the first ntp server (vA) runs with the extra kernel parameters and the other (vB) runs without them. The clock of the physical servers (pA and pB) was slowed down by hand using adjtimex in order to test the effect of the physical server’s clock on the VM clocks. As you can see, this server is still in sync with vA and has a very large offset with vB. Note that this server is not a VM under pA or pB.
$ ntpq -nc peers remote refid st t when poll reach delay offset jitter ============================================================================== *10.93.XXX.XXX 216.218.254.202 2 u 81 256 377 0.433 -87.076 20.341 10.93.XXX.XXX 216.218.254.202 2 u 290 512 377 0.673 11487.6 9868.84
I.e., what happened is that the first one, using the extra parameters, kept its clock accurate while the second did not.