NTP Design Stratum Numbers

I already posted about NTP in a Windows domain. I want to briefly talk about stratum numbers. I’ve been in environments where applications were picky when it came to those numbers. So what is the stratum number?

NTP uses the concept of a stratum to describe how many NTP hops away a machine is from an authoritative time source. It’s like a routing protocol. Just because the hop count is lower than another, doesn’t always mean it will use the lower one. The algorithm will also use latency to help determine the best time server.

Why is this important?

1. Security – time-stamps in logs will be inaccurate. This is very bad especially if there’s a breach. (SSL Certs would Expire, Backups could delete if they expires, Time Based One Time Passwords could Expire, etc…)
2. Troubleshooting – time-stamps in logs will be inaccurate. Good luck troubleshooting issues when the times are all different.
3. HIPAA and SOX Compliance – require accurate time-stamping
4. 3rd party Applications/Services can stop working and/or not work efficiently
5. Windows Domain Services can stop working and/or not work efficiently – If a domain goes out out sync, problems can occur like when Kerberos is failing because of authentication and access issues. If a member server is more than 5 minutes off from the DC, Kerberos will fail to authenticate requests.
6. Network devices will have HA Sync Failures

There’s more to these but I won’t list everything. NTP and time in your environment is something you need to pay attention to. The goal it to make sure your time is all synced up properly and check the stratum numbers. You want the most reliable setup possible.

Below is an example of a design. You can see that NTP is not something you just ignore. It should be something that’s planned and designed out. At previous companies I’ve worked, we always used GPS Timer servers internally. In the example below, I have one NTP server in the Primary DC because we own it and we have no issues mounting the GPS Antenna outside. But let’s say that your DR site is in a Colo where you can’t put up a GPS antenna. So you can pick another site you have that’s safe to put your second GPS NTP Server.

Some General Notes:
– The number of NTP servers depends on your environment. You may want to use 2 GPS servers and not one, this way you have some local redundancy for thinks like bad hardware or maintenance. I’ve been in environments where one per location was fine because the applications were not that picky but environments vary.  Also keep in mind that there’s signal jamming. So if you have 2 GPS servers in the same location, they both can get signal jammed at the same time so you want to make sure they are dispersed.  There are some features to help mitigate this.
– You want whatever is giving the time CLOSEST to the device or client. For example, you don’t want your Domain Controllers to go across the WAN to get its’ time. Point the DC to a local time source like core switches, edge routers, etc…  You want to reduce the jitter and timing offset errors.
– You want three or more independent NTP times servers for your clients.  The reason for three is because when NTP syncs, it will check the offset between all the times it received.  Depending on the offset, it will either reject it or use it to determine the best time.
– Use your own NTP servers internally to reduce the risk of NTP attacks and/or vulnerabilities
– Use NTP Authentication to help mitigate Man-in-the-Middle attacks plus it’s just good practice to know who your NTP’s servers are talking to
– If you’re an enterprise, don’t use the internet to get your time. Build out your own internal infrastructure.

 

Below is an example but you’ll notice if didn’t throw in a Tertiary.  This of course all depends. If you’re an enterprise, then I would not go out over the internet for time. But if you’re much smaller, I would suggest based on your network layout, don’t have every device go over the internet for it but have another pair of devices server as the Tertiary.  Have that pair locked down to get it’s time from the most reliable source over the internet. 

 

ntp stratum

 

So what happens if for instance a device or a Windows server does loose sync to an NTP server.  Typically these devices have a local/hardware clock which will keep it’s time but there will be drift in timing between 1-10 seconds per day.  This is much higher for a VM environment.  When it connects back to the NTP server, it will correct itself.  This is very high level and could change for certain devices or be changed.  Always look into all the devices that make up your NTP design.

I just want to finish with, you’ll be surprised how many environments don’t have this correctly configured. Not too long ago I ran into it an environment where the Servers, Network Equipment, DMZ’s, all running different times with different time zones. Some where pulling time from the internet, some were not even pulling their time.  This just comes from a lack of understanding how NTP works and the importance of it.

 
NTP is a much bigger topic than just this blog post. I’m hoping this is a good start. Like everything else in IT, it all depends. Size, environment, costs, budget, management decisions, etc… will play a factor. I would just say, try to build it as redundant and reliable as possible. Try not to get the time from the internet if you can.