By: Bill Jones, Sr. Solution Architect
I spend a lot of time talking to clients about time – specifically NTP (network time protocol). We often take for granted how time synchronizes in a modern network. For many people, correct network time is assumed to be a given. That is, until the network time is wrong. If network time is wrong, Kerberos authentication can fail, affecting both Windows and Linux systems. So, let’s make some time to talk about time.
How Does Network Time Protocol Work?
At its most basic level, network time protocol (NTP) works in much the same way as asking someone, “What time is it?” For example, I just walked into my coworker’s office and had the following conversation.
Me: What time is it?
Kellen: The time is 10:24 a.m.
That is NTP in a nutshell. I have reason to believe Kellen will know the correct time. I ask Kellen for the time. Kellen tells me the time. That is NTP. At deeper levels, there are lots of other things going on, things like minimum skews, maximum skews, stratum numbers, eras, offsets, etc. But, this blog post isn’t a deep dive into NTP. It is about how NTP affects a modern virtualized computing environment.
Why Use NTP? Doesn’t Each Computer Have a Clock?
Yes, each computer has a clock, and that clock has a battery. That way, the clock can keep running even when the computer is powered off. When the computer is running, the operating system keeps track of the time by counting CPU cycles. Right now, my laptop is running at 2.1GHz, which means that each CPU cycle lasts about 0.476 nanoseconds. The computer can tell that one (1) second has elapsed when 2.1 billion CPU cycles complete. (Again, the process is more complicated at deeper levels. For example, the operating system has to account for leap seconds. But, again, we’re keeping this at a high level.)
Despite all this precision, over time a computer’s clock can become inaccurate. NTP helps to correct this gradual drift within computer clocks. System administrators choose a trusted time source, synchronize a system on their network to the trusted time source, and then have other devices synchronize from that internal system.
What Does This Have to Do with Virtualization?
One of the great advantages of virtualization is the ability to oversubscribe CPUs. By that, I mean, if you add up all of the virtual CPUs assigned to virtual machines on a host, there will be more total virtual CPUs than there will be logical processors on the host. Again, this isn’t just a common occurrence; it is one of the major advantages to running virtual machines.
Since there are more virtual CPUs than the host has logical processors, with every CPU cycle some virtual machines get access to a logical processor, and some don’t. But, when each CPU cycle is measured as fractions of a nanosecond, people seldom notice the skipped clock cycles. Unfortunately, as we said above, computers count clock cycles to keep track of the time. When the hypervisor makes a virtual machine skip a clock cycle, that virtual machine’s clock cannot increment its system time correctly. Over time, that virtual machine’s clock will become less and less accurate.
To address this, virtualization products include tools that regularly synchronize virtual machine system clocks to their physical host’s system clock. With VMware ESXi, this product is called VMware Tools; with Hyper-V it is called Hyper-V Time Synchronization Service. So, as long as the physical host has the correct time, virtual machines on the host will have the correct time, provided the correct tools are installed. But, when the virtualization host has the incorrect time, all of its virtual machines will similarly have the incorrect time.
How Can a Virtualization Host Have the Incorrect Time?
In many environments, Windows Active Directory (AD) synchronizes system time. Within each AD domain, the domain controller with the PDC Emulator FSMO role is in charge of keeping track of the time for the domain. All of the other domain controllers synchronize time to the PDC Emulator, and each member system synchronizes time to the domain controllers. In a multi-domain forest, the PDC Emulator for each domain synchronizes with the PDC emulator of its parent domain, all the way up to the forest root domain. So, as long as the domain controller with the PDC Emulator FSMO role correctly synchronizes to a trusted NTP server, the whole AD forest will have the correct time. It’s a really cool process, and I encourage you to read up on it.
Unfortunately, from time to time (no pun intended), I still encounter virtualization environments with incorrect system clocks. These issues usually arise from two (2) common configuration errors.
The first configuration error is not disabling the host-to-VM time synchronization on virtual domain controllers. In Hyper-V environments, if the Hyper-V hosts are members of the domain they will synchronize their times with the domain controllers. In many VMware environments, the ESXi hosts synchronize time from an AD domain controller. In both scenarios, the hosts trust the time from the virtual domain controllers, and the domain controllers trust the time from the hosts. When that happens, the AD forest’s time gradually drifts from the correct time. This is why I repeatedly mention to my clients that they if they virtualize domain controllers, they should either:
- Always disable host time synchronization on their virtual domain controllers, or
- Synchronize ESXi or Hyper-V host times from a trusted time source that is *not* an Active Directory domain controller.
The second configuration error occurs in VMware environments when ESXi hosts are configured to synchronize with Active Directory domain controllers. To make this synchronization work, the default ESXi host NTP client configuration must be modified. Here is a link to the VMware KB article detailing the necessary changes.
A few years ago, these time synchronization issues came up frequently, and I’d get a call from someone who wanted to know why their computers were running slow by several minutes. As a result, I started proactively telling clients about time synchronization issues in a virtual environment. I often begin the discussion by saying, “I’m probably going to tell you about this over and over again…” Because time is important.
So, thanks for taking the time to read about time (pun intended).