Introduction: DNS security threats and mitigations
Because of the open, distributed design of the Domain Name System, and its use of the User Datagram Protocol (UDP), DNS is vulnerable to various forms of attack. Public or “open” recursive DNS resolvers are especially at risk, since they do not restrict incoming packets to a set of allowable source IP addresses. We are mostly concerned with two common types of attacks:
Spoofing attacks leading to DNS cache poisoning. Various types of DNS spoofing and forgery exploits abound, which aim to redirect users from legitimate sites to malicious websites. These include so-called “Kaminsky attacks”, in which attackers take authoritative control of an entire DNS zone.
Denial-of-service (DoS) attacks. Attackers may launch DDoS attacks against the resolvers themselves, or hijack resolvers to launch DoS attacks on other systems. Attacks that use DNS servers to launch DoS attacks on other systems by exploiting large DNS record/response size are known as amplification attacks.
Each class of attack is discussed further below.
Cache poisoning attacks
There are several variants of DNS spoofing attacks that can result in cache poisoning, but the general scenario is as follows:
The attacker sends a target DNS resolver multiple queries for a domain name for which s/he knows the server is not authoritative, and that is unlikely to be in the server’s cache.
The resolver sends out requests to other nameservers (whose IP addresses the attacker can also predict).
In the meantime, the attacker floods the victim server with forged responses that appear to originate from the delegated nameserver. The responses contain records that ultimately resolve the requested domain to IP addresses controlled by the attacker. They might contain answer records for the resolved name or, worse, they may further delegate authority to a nameserver owned by the attacker, so that s/he takes control of an entire zone.
If one of the forged responses matches the resolver’s request (for example, by query name, type, ID and resolver source port) and is received before a response from the genuine nameserver, the resolver accepts the forged response and caches it, and discards the genuine response.
Future queries for the compromised domain or zone are answered with the forged DNS resolutions from the cache. If the attacker has specified a very long time-to-live on the forged response, the forged records stay in the cache for as long as possible without being refreshed.
DNS resolvers are subject to the usual DoS threats that plague any networked system. However, amplification attacks are of particular concern because DNS resolvers are attractive targets to attackers who exploit the resolvers’ large response-to-request size ratio to gain additional free bandwidth. Resolvers that support EDNS0 (Extension Mechanisms for DNS) are especially vulnerable because of the substantially larger packet size that they can return.
In an amplification scenario, the attack proceeds as follows:
The attacker sends a victim DNS server queries using a forged source IP address. The queries may be sent from a single system or a network of systems all using the same forged IP address. The queries are for records that the attacker knows will result in much larger responses, up to several dozen times1 the size of the original queries (hence the name “amplification” attack).
The victim server sends the large responses to the source IP address passed in the forged requests, overwhelming the system and causing a DoS situation.
The standard system-wide solution to DNS vulnerabilities is the DNSSEC protocol. However, until it is universally implemented, open DNS resolvers need to independently take some measures to mitigate against known threats. Many techniques have been proposed; see IETF RFC 5452: Measures for making DNS more resilient against forged answers for an overview of most of them. In Google Public DNS, we have implemented, and we recommend, the following approaches:
Securing your code against buffer overflows, particularly the code responsible for parsing and serializing DNS messages.
Overprovisioning machine resources to protect against direct DoS attacks on the resolvers themselves. Since IP addresses are trivial for attackers to forge, it’s impossible to block queries based on IP address or subnet; the only effective way to handle such attacks is to simply absorb the load.
Implementing basic validity-checking of response packets and of nameserver credibility, to protect against simple cache poisoning. These are standard mechanisms and sanity checks that any standards-compliant caching resolver should perform.
Adding entropy to request messages, to reduce the probability of more sophisticated spoofing/cache poisoning attacks such as Kaminsky attacks. There are many recommended techniques for adding entropy, including randomizing source ports; randomizing the choice of nameservers (destination IP addresses); randomizing case in name requests; and appending nonce prefixes to name requests. Below, we give an overview of the benefits, limitations, and challenges of each of these techniques, and discuss how we implemented them in Google Public DNS.
Monitoring the service for the client IPs using the most bandwidth and experiencing the highest response-to-request size ratio.
Supporting the DNSSEC protocol
The Domain Name Security Extensions (DNSSEC) standard is specified in several IETF RFCs: 4033, 4034, 4035, and 5155.
Resolvers that implement DNSSEC counter cache poisoning attacks by verifying the authenticity of responses received from nameservers. Each DNS zone maintains a set of private/public key pairs and for each DNS record, a unique digital signature is generated and encrypted using the private key. The corresponding public key is then authenticated via a chain of trust by keys belonging to parent zones. DNSSEC-compliant resolvers reject reponses that do not contain the correct signatures. DNSSEC effectively prevents responses from being tampered with, because in practice, signatures are almost impossible to forge without access to private keys.
As of January, 2013, Google Public DNS fully supports the DNSSEC protocol. We accept and forward DNSSEC-formatted messages and validate responses for correct authentication. We strongly encourage other resolvers to do the same.
Implementing basic validity checking
Some DNS cache corruption can be due to unintentional, and not necessarily malicious, mismatches between requests and responses (e.g. perhaps because of a misconfigured nameserver, a bug in the DNS software, and so on). At a minimum, DNS resolvers should put in checks to verify the credibility and relevance of nameservers’ responses. We recommend (and implement) all of the following defenses:
Do not set the recursive bit in outgoing requests, and always follow delegation chains explicitly. Disabling the recursive bit ensures that your resolver operates in “iterative” mode so that you query each nameserver in the delegation chain explicitly, rather than allowing another nameserver to perform these queries on your behalf.
Reject suspicious response messages. See below for details of what we consider to be “suspicious”.
Do not return A records to clients based on glue records cached from previous requests. For example, if you receive a client query for ns1.example.com, you should re-resolve the address, rather than sending an A record based on cached glue records returned from a .com TLD nameserver.
Rejecting responses that do not meet required criteria
Google Public DNS rejects all of the following:
Unparseable or malformed responses.
Responses in which the query ID, source IP, source port, or query name do not match those of the request.
Records which are not relevant to the request.
Answer records for which we cannot reconstruct the CNAME chain.
Records (in the answer, authority, or additional sections) for which the responding nameserver is not credible. We determine the “credibility” of a nameserver by its place in the delegation chain for a given domain. Google Public DNS caches delegation chain information, and we verify each incoming response against the cached information to determine the responding nameserver’s credibility for responding to a particular request.
Adding entropy to requests
Once a resolver does enforce basic sanity checks, an attacker has to flood the victim resolver with responses in an effort to match the query ID, UDP port (of the request), IP address (of the response), and query name of the original request before the legitimate nameserver does.
Unfortunately, this is not difficult to achieve, as the one uniquely identifying field, the query ID, is only 16 bits long (i.e. for a 1/65,536 chance in getting it right). The other fields are also limited in range, making the total number of unique combinations a relatively low number. See IETF RFC 5452, Section 7 for a calculation of the combinatorics involved.
Therefore, the challenge is to add as much entropy to the request packet as possible, within the standard format of the DNS message, to make it more difficult for attackers to successfully match a valid combination of fields within the window of opportunity. We recommend, and have implemented, all the techniques discussed in the following sections.
Randomizing source ports
As a basic step, never allow outgoing request packets to use the default UDP port 53, or to use a predictable algorithm for assigning multiple ports (e.g. simple incrementing). Use as wide a range of ports from 1024 to 65535 as allowable in your system, and use a reliable random number generator to assign ports. For example, Google Public DNS uses ~15 bits, to allow for approximately 32,000 different port numbers.
Note that if your servers are deployed behind firewalls, load-balancers, or other devices that perform network address translation (NAT), those devices may de-randomize ports on outgoing packets. Make sure you configure NAT devices to disable port de-randomization.
Randomizing choice of nameservers
Some resolvers, when sending out requests to root, TLD, or other nameservers, select the nameserver’s IP addressed based on the shortest distance (latency). We recommend that you randomize destination IP addresses to add entropy to the outgoing requests. In Google Public DNS, we simply pick a nameserver randomly among configured nameservers for each zone, somewhat favoring fast and reliable nameservers.
If you are concerned about latency, you can use round-trip time (RTT) banding, which consists of randomizing within a range of addresses that are below a certain latency threshold (e.g. 30 ms, 300 ms, etc.).
Randomizing case in query names
The DNS standards require that nameservers treat names with case-insensitivity. That is, the names example.com and EXAMPLE.COM should resolve to the same IP address3. However, in the response, most nameservers echo back the name as it appeared in the request, preserving the original case.
Therefore, another way to add entropy to requests is to randomly vary the case of letters in domain names queried. This technique, also known as “0x20” because bit 0x20 is used to set the case of of US-ASCII letters, was first proposed in the IETF internet draft Use of Bit 0x20 in DNS Labels to Improve Transaction Identity. With this technique, the nameserver response must match not only the query name but the case of every letter in the name string; for example, wWw.eXaMpLe.CoM or WwW.ExamPLe.COm. This may add little or no entropy to queries for the top-level and root domains, but it’s effective for most hostnames.
One significant challenge we discovered when implementing this technique is that some nameservers do not follow the expected response behavior:
Some nameservers respond with complete case-insensitivity: that is, they return the same results for equivalent names with different cases in the request; but they do not match the exact case of the name in the response.
Other nameservers respond with complete case-sensitivity (in violation of the DNS standards): that is, they match the exact case of the name in the response; but return different results for equivalent names with different cases in the request (typically NXDOMAIN)!
For both of these types of nameservers, altering the case of the query name would produce undesirable results: for the first group, the response would be indistinguishable from a forged response; for the second group, the response could be totally invalid.
Our current solution to this problem is to create a whitelist of nameservers which we know apply the standards correctly, and to only apply the case randomization technique in requests to those servers. We also list the appropriate exception subdomains for each of them, based on analyzing our logs. If a response that appears to come from those servers does not contain the correct case, we reject the response. The whitelisted nameservers comprise more than 70% of our traffic.
Note that while upper and lower case letters are allowed in domain names, no significance is attached to the case. That is, two names with the same spelling but different case are to be treated as if identical.
Prepending nonce labels to query names
If a resolver cannot directly resolve a name from the cache, or cannot directly query an authoritative nameserver, then it must follow referrals from a root or TLD nameserver. In most cases, requests to the root or TLD nameservers will result in a referral to another nameserver, rather than an attempt to resolve the name to an IP address. For such requests, it should therefore be safe to attach a random label to a query name to increase the entropy of the request, while not risking a failure to resolve a non-existent name. That is, sending a request to a referring nameserver for a name prefixed with a nonce label, such as entriih-f10r3.www.google.com, should return the same result as a request for www.google.com.
Although in practice such requests make up less than 3% of outgoing requests, assuming normal traffic (since most queries can be answered directly from the cache or by a single query), these are precisely the types of requests that an attacker tries to force a resolver to issue. Therefore, this technique can be very effective at preventing Kaminsky-style exploits.
Implementing this technique requires that nonce labels only be used for requests that are guaranteed to result in referrals; that is, responses that do not contain records in the answers section. However, we encountered several challenges when attempting to define the set of such requests:
Some country-code TLD (ccTLD) nameservers are actually authoritative for other second-level TLDs (2LDs). Although they have two labels, 2LDs behave just like TLDs, which is why they are often handled by ccTLD nameservers. For example, the .uk nameservers are also authoritative for the mod.uk and nic.uk zones, and, hence, hostnames contained in those zones, such as www.nic.uk, www.mod.uk, and so on. In other words, requests to ccTLD nameservers for resolution of such hostnames will not result in referrals, but in authoritative answers; appending nonce labels to such hostnames will cause the names to be unresolvable.
Sometimes generic TLD (gTLD) nameservers return non-authoritative responses for nameservers. That is, there are some nameserver hostnames that happen to live in a gTLD zone rather than in the zone for their domain. A gTLD will return a non-authoritative answer for these hostnames, using whatever glue record it happens to have in its database, rather than returning a referral. For example, the nameserver ns3.indexonlineserver.com lives in a gTLD zone rather than in the indexonlineserver.com zone. If we issue a request to a gTLD server for n3.indexonlineserver.com, we get an IP address for it, rather than a referral. However, if we prepend a nonce label, we get a referral to indexonlineserver.com, which is then unable to resolve the hostname. Therefore, we cannot append nonce labels for nameservers which require a resolution from a gTLD server.
Authorities for zones and hostnames change over time. This can cause a nonce-prepended hostname that was once resolvable to become unresolvable if the delegation chain changes.
To address these challenges, we created a “blacklist” file containing exceptions for which we cannot append nonce labels. The file is populated with hostnames for which TLD nameservers return non-referring responses, according to our server logs. We continually review the exceptions list to ensure that it stays valid over time.
Removing duplicate queries
DNS resolvers are vulnerable to “birthday attacks”, so called because they exploit the mathematical “birthday paradox”, in which the likelihood of a match does not require a large number of inputs. Birthday attacks involve flooding the victim server not only with forged responses but also with initial queries, counting on the resolver to issue multiple requests for a single name resolution. The greater the number of issued outgoing requests, the greater the probability that the attacker will match one of those requests with a forged response: an attacker only needs on the order of 300 in-flight requests for a 50% success chance at matching a forged response, and 700 requests for close to 100% success.
To guard against this attack strategy, you should be sure to discard all duplicate queries from the outbound queue. For example, Google Public DNS, never allows more than a single outstanding request for the same query name, query type, and destination IP address.
Preventing denial-of-service attacks poses several particular challenges for open recursive DNS resolvers:
Open recursive resolvers are attractive targets for launching amplification attacks. They are high-capacity, high-reliability servers and can produce larger responses than a typical authoritative nameserver — especially if an attacker can inject a large response into their cache. It is incumbent on any developer of an open DNS service to prevent their servers from being used to launch attacks on other systems.
Amplification attacks can be difficult to detect while they are occurring. Attackers can launch an attack via thousands of open resolvers, so that each resolver only sees a small fraction of the overall query volume and cannot extract a clear signal that it has been compromised.
Malicious traffic must be blocked without any disruption or degration of the DNS service to normal users. DNS is an essential network service, so shutting down servers to cut off an attack is not an option, nor is denying service to any given client IP for too long. Resolvers must be able to quickly block an attack as soon as it starts, and restore fully operational service as soon as the attack ends.The best approach for combating DoS attacks is to impose a rate-limiting or “throttling” mechanism. Google Public DNS implements two kinds of rate control:
Rate control of outgoing requests to other nameservers. To protect other DNS nameservers against DoS attacks that could be launched from our resolver servers, Google Public DNS enforces per-nameserver QPS limits on outgoing requests from each serving cluster.
Rate control of outgoing responses to clients. To protect any other systems against amplification and traditional distributed DoS (botnet) attacks that could be launched from our resolver servers, Google Public DNS performs two types of rate limiting on client queries:
To protect against traditional volume-based attacks, each server imposes per-client-IP QPS and average bandwidth limits.
To guard against amplification attacks, in which large responses to small queries are exploited, each server enforces a per-client-IP maximum average amplification factor. The average amplification factor is a configurable ratio of response-to-query size, determined from historical traffic patterns observed in our server logs.
If queries from a specific source IP address exceed the maximum QPS, or exceed the average bandwidth or amplification limit consistently (the occasional large response will pass), we return (small) error responses or no response at all.