White Paper: Anti-Keylogger Myths
The paper shows that BIND 9 DNS queries are predictable – i.e. that the source UDP port and DNS transaction ID can be effectively predicted. A predictability algorithm is described that, in optimal conditions, provides very few guesses for the "next" query (10 in the basic attack, and 1 in the advanced attack), thereby overcoming whatever protection offered by the transaction ID mechanism. This enables a much more effective DNS cache poisoning than the currently known attacks against BIND 9. The net effect is that pharming attacks are feasible against BIND 9 caching DNS servers, without the need to directly attack neither DNS servers nor clients (PCs). The results are applicable to all BIND 9 releases , when BIND (the named daemon) is in caching DNS server configuration.
Trusteer makes no representation or warranties, either express or implied by or with respect to anything in this document, and shall not be liable for any implied warranties of merchantability or fitness for a particular purpose or for any indirect special or consequential damages. No part of this publication may be reproduced, stored in a retrieval system or transmitted, in any form or by any means, photocopying, recording or otherwise, without prior written consent of Trusteer. No patent liability is assumed with respect to the use of the information contained herein. While every precaution has been taken in the preparation of this publication, Trusteer assumes no responsibility for errors or omissions. This publication and features described herein are subject to change without notice.
Table of Contents
* 1. Introduction
* 2. Attacking the BIND 9 DNS Cache Server ("named")
o 2.1 Observations on BIND's "named"
o 2.2 The basic attack
o 2.3 An advanced attack: full PRNG state reconstruction
o 2.4 Attack variants
+ 2.4.1 Pre-computed table
+ 2.4.2 Information theoretic results
+ 2.4.3 Linear equations
+ 2.4.4 Earlier versions of BIND 9
+ 2.4.5 Additional ways to force multiple queries
* 3. Conclusions
* 4. Disclosure timeline
* 5. Vendor/product status
* 6. References
* Appendix A – XSL file
* Appendix B – BIND 9 simple prediction script
* Appendix C – BIND 9 PRNG reconstruction script
Attacks against DNS, and particularly the concept of DNS cache poisoning has been known for over a decade (e.g.  section 5.3 was published in 1989 and  was published in 1993). A concise threat analysis for the existing DNS infrastructure can be found in . The focus of this paper is on DNS cache poisoning attack.
Typically, a DNS query is sent over the connectionless UDP protocol. The UDP response is associated with the request via the source and destination host and port (UDP properties), and via the 16 bit transaction ID value (the response's transaction ID should be identical to the request's transaction ID). Assuming that an attacker knows that a DNS query for a specific domain is about to be sent, from a specific DNS server/resolver, the attacker can trivially predict the source IP address (the address of the requesting name server/client), the destination IP address (the address of the target name server), and the destination UDP port (53 – the standard UDP port for DNS queries). The attacker needs additional 2 data items – the source UDP port, and the DNS transaction ID, to be able to blindly inject his/her own response (before the target server's response – typically DNS server use the first matching response and silently discards any further responses).
As mentioned above, the transaction ID is 16 bits quantity, and the source UDP port is theoretically 16 bits quantity too (though for practical reasons, only a sub-range is used as UDP source ports – e.g. in 1024/1025-4999/5000 in older operating systems, and 49152-65535 in newer operating systems).
So in theory, the total entropy from an attacker's point of view is 32 bits, and practically (in older operating systems) log2(3976·216) which is almost 28 bits, or (in newer operating systems) log2(16384·216) which is 30 bits.
Note that for practical reasons, it is not a good idea to use a combination of transaction ID and UDP port which are already in the "waiting queue" for a DNS response. Typically there are very few such pending requests, so this has negligible effect on the overall entropy.
In BIND 9 the UDP source port is predictable – it is determined when the daemon is started or shortly thereafter (the UDP port is unchanged, as mentioned in  and its thread).
In general, predictability of the transaction ID can facilitate DNS cache poisoning attacks. This was mentioned in  section 5.3,  and  section 6.1. In April 1997, it was discovered that BIND (4.9.5) generates a sequential transaction ID (); it seems though that the BIND developers (led by Paul Vixie) were aware of this attack vector back in 1995 (see  section 6.1). While the advisory contained a detailed fix suggestion, using modular arithmetic PRNG, the issue was actually fixed by introducing a hash-table based PRNG for BIND 8.2 (released March 1999), but the code was rewritten in BIND 9.0.0 (released September 2000) to make use of a linear feedback shift register based PRNG.
To clarify: the rest of this discussion assumes BIND 9.4.1 (or 9.x in general) wherein those old vulnerabilities do not exist.
In April 2001 a paper () was released, describing the use of a method called "attractors" to outline anomalies and predictability in numeric sequences. In January 2003, this method was applied to BIND 9.2.2rc1 (), concluding that "BIND 9's random number sequence is predictable 20% of the time with a spoofing set size of 5000". However, this result is only roughly about 2.5 times better than what can be achieved using 5000 randomly chosen values, and as will be shown below, a much better result can be obtain by a closer analysis. Note that this analysis was conducted prior to (and perhaps served as a trigger to) the fix introduced in BIND 9.2.3rc1 (August 2003)1.
1 In BIND 9.2.3rc1, an implementation bug was fixed in the PRNG (see , bugs 1406 and 1407)
Combining the above "attractors" attack with the static UDP port yields an attack that requires about 5000 DNS responses to poison the cache. It is doubtful that such attack will be practical, since a DNS response cannot be a lot shorter than 80 bytes (in reality the attacker would probably need a bit more, so 100-150 is a better assumption, but nevertheless 80 can be used as a lower limit, for the benefit of the doubt), and 5000 such responses yield 400KB. That much data should arrive at the DNS stack between the time it emits the DNS query to be poisoned and the time the genuine server's response arrive to it. A single DNS round trip typically takes anywhere between few dozen milliseconds to few hundred milliseconds (for example, consider the 0-referral latency in table 1 of , or the statistics for the .COM gTLD in ). Assuming 100ms round trip, that requires the attacker a significant uplink bandwidth of 32 megabit/sec (similar calculations can be found in section 6 of ). Even if the attractor method is refined and an order of magnitude improvement is achieved, it would still require an uplink of 3.2 megabit/sec, which is not trivial on one hand, and may still not be enough on the other hand (it assumes 100ms round trip for the genuine DNS query, and in some cases the genuine DNS server may respond faster). And all this only guarantees 20% success rate.
Another well known attack against DNS caching/resolution is the "birthday attack". The birthday attack against DNS servers is hinted to in  (July 2001) and described in fullness in  (November 2002); a more elaborate discussion can be found in  and .
Essentially, where there are N entropy bits, the attack consists of sending simultaneously about 2N/2 DNS queries and 2N/2 DNS responses in order to make a match (with high enough probability). Unfortunately, the birthday attack cannot be combined with the "attractors" method. That's because the birthday attack needs multiple DNS queries (to the same target server), and each such query results in its own transaction ID. Using the attractor to predict the next transaction ID requires that the previous sequence number be known. Yet after the first query is sent, this condition cannot be met.
Combining the birthday attack with the UDP port information yields an attack that requires simultaneous launching of few hundred DNS queries and responses (we have N=16 so 2N/2=256) to cover for the 16 entropy bits of the DNS transaction ID. In order for the attack to be effective, this burst should take no longer than the round trip of the DNS query and answer from the genuine server (say, 100ms). However, forcing the DNS stack to receive several hundred DNS queries in a short period of time is oftentimes not realistic, especially when considering DNS security architecture such as Split-Split DNS. With Split-Split DNS architecture, the only way to access the caching DNS server is from within the organization (or ISP) – "external" queries are not served, e.g. they may be blocked by a firewall. This is a pretty standard setup nowadays (it is the recommended DNS secure architecture). The paper assumes, therefore, that the attacker has no direct access to the internal network, i.e. that the attacker cannot run home made executable (attack scripts) from the internal network. This pretty much rules out the option to hit the DNS stack with thousands of queries per second, thereby rendering the birthday attack impractical.
The attacks described in this paper make use of the predictable nature of BIND 9 transaction IDs to attack the DNS stack. It is assumed that the stack can be forced to perform DNS queries using a malicious web page (the concept of poisoning DNS cache through a malicious web page is described in  and demonstrated in  for a different kind of DNS attack). This is a real-life condition, but of course it is quite limiting in what the attacker can do – the attacker, for example, cannot force a burst of hundreds of queries all for the same hostname to be emitted from the same client. Nevertheless, it will be shown that since the transaction ID (and the UDP source port) is predictable enough, this suffices to mount a successful attack.
2. Attacking the BIND 9 DNS Cache Server ("named")
2.1 Observations on BIND's "named"
The BIND 9 named server uses static UDP source port (acquired at the startup of the daemon's run), and generates a very predictable transaction ID. A full analysis of the transaction ID generation mechanism was carried out using the BIND freely available source code. The research results were verified using live captures of named queries obtained from named (from a standard BIND 9.4.1 installation) running on Windows XP SP2. Since the analysis doesn't rely on the initialization of the transaction ID mechanism, but rather on the way it advances (which is common to all platforms), the results thus obtained are applicable to all hardware and software platforms.
The PRNG in use for generating transaction IDs is implemented in the BIND 9.4.1 source () file ./lib/isc/lfsr.c. In essence, the caller (function qid_allocate() in file ./lib/dns/dispatch.c) calls isc_lfsr_init() at the beginning of the run for each of the two "lfsr" variables to initialize the PRNG. As of this moment, the caller (function dns_randomid() in file ./lib/dns/dispatch.c) calls isc_lfsr_generate32 for each transaction ID, obtaining 32 pseudo random bits with each call (and using the least significant 16 bits of these as the transaction ID).
The internal state thus consists of two lfsr variables, which are 32 bit quantities. With each call to isc_lfsr_generate32, they are advanced as mutual feedback linear feedback shift registers, as following:
C code (adapted from the above files and modified for clarity):
unsigned int lfsr_generate(unsigned int lfsr_state, unsigned int tap)
if (lfsr_state & 1)
lfsr_state = (lfsr_state >> 1) ^ tap;
lfsr_state >>= 1;
unsigned int lfsr_skipgenerate(unsigned int lfsr_state,
unsigned int tap,
unsigned int skip)
lfsr_state = lfsr_generate(lfsr_state, tap);
lfsr_state = lfsr_generate(lfsr_state, tap);
skip1 = lfsr1_state & 1;
skip2 = lfsr2_state & 1;
lfsr1_state = lfsr_skipgenerate(lfsr1_state, tap1, skip2);
lfsr2_state = lfsr_skipgenerate(lfsr2_state, tap2, skip1);
trxid = (lfsr1_state ^ lfsr2_state) & 0xFFFF;
In words, the algorithm is as following:
- The least significant bit of each variable is saved.
- Each variable is advanced (shifted right) as an LFSR (with hard-wired, constant tap) once if its saved peer bit (see above) is 0 and twice if the saved peer bit is 1.
- Finally, the 16 bit transaction ID is the 16 least significant bits of the XOR value of the two variables. It is serialized with most significant byte first, then least significant byte (big endian style).
It is important to note that the above description does not cover a code branch (in function lfsr_generate(), file ./lib/isc/lfsr.c) which, for each variable, if its state is 0, then it is re-seeded. In reality, this never happens, because the initial seeding ensures that the initial state in each variable is never 0. And since both LFSR taps are reversible, it can be easily seen that neither variable can assume the value 0.
The net result is, therefore, a system comprising of two 32 bit mutually clock-controlled LFSRs, whose states are linearly combined to yield 16 bit output. In essence, this is a weak version (since the output is 16 bits, as opposed to the traditional 1 bit) of the well studied cryptosystem known by many names: "bilateral stop/go (LFSR) generator", "mutually clock controlled (LFSR) generator" and "mutual (or bilateral) step-1/step-2 (LFSR) generator". The variant used in BIND 9 is very weak due to its large output comprising of 16 bits (out of the combined internal state of 16 bits). As such, it lends itself to some trivial attacks as can be seen below.
An observation that plays an important role later is as following. When the transaction ID least significant bit is 0, it means that the next step, the two LFSRs will advance in the same way (because their peer bits are identical). This can be either one step (when the two bits are 0) or two steps (when the two bits are 1).
Assuming now that the least significant bit of the transaction ID is indeed 0, there are two branches, depending on the actual values of the pair of least significant bits in the two LFSRs:
* When the two bits are 0 (probability ½), it means that the next value of each LFSR is its current value, shifted right, with an unknown most significant bit. The XOR of the least significant 16 bits (i.e. the next transaction ID) is therefore the current transaction ID, shifted right once, with an unknown most significant bit. In other words, when the two least significant bits are 0, there are two candidates for the next transaction ID.
* When the two bits are 1 (probability ½), the situation is slightly more complicated. Both registers are advanced twice. Moreover, in the first step, both registers force their taps to XOR into them (because the least significant bits are 1). However, at the second step, the bits are unknown. But that's not the end of it, because while the exact bits are unknown, their XOR is known, so there are actually only two cases (guesses). And of course, the two most significant bits of the result are unknown too, so there are 8 candidates altogether in this branch.
To summarize, when the least significant bit of the transaction ID is 0, there are 10 possible values (and each such value is easily calculated) for the next transaction ID (2 when both bits are 0, and 8 when both bits are 1). Note that the probability of the values is not uniform: since the probability for two 0 bits is ½, it follows that each of the two values associated with this branch has probability ¼, while the probability of the two 1 bits is ½, which means that each value of the eight values associated with this branch has probability 1/16. In information theoretic terms, when the last significant bit of the transaction ID is 0, the entropy of the next transaction ID is 3 bits, instead of the theoretic maximum of 16 bits.
2.2 The basic attack
The attack target is an organization with BIND 9 DNS caching server. This server does not answer DNS queries from the Internet, and no direct access to the internal network is available for the attacker. The goal of the attack is to poison the cache entry for the domain example.com. It is assumed that this domain is not yet cached (or that its cache entry has expired). The attacker needs to make the cache server cache the authoritative name server entry for example.com as the attacker's IP address, rather than the IP address of the real authoritative name server for example.com.
The attacker lures one of the network users to visit the attacker's web page. This page contains an image URL to, say, www1.attacker.com. The discussion below skips the part where the name server obtains the authoritative name-server for attacker.com and focuses on the query for www1.attacker.com. It is sent to the attacker's name server. This name server observes the least significant bit of the DNS transaction ID. If it is not 0, it sends back a CNAME record for the next host name (i.e. a CNAME that points at www2.attacker.com). The BIND 9 DNS server will then request www2.attacker.com with the next ID value. This process repeats itself few times (up to 14 times due to CNAME chaining support by BIND 9) until the bit value is 0. At this point, the attacker name server returns a CNAME record that points at www.example.com. Note that altogether up to (and possibly including) 15 CNAME "redirections" were performed - the BIND 9 DNS server follows up to (and including) 15 CNAME redirections. However, half of the time, the first DNS query (to www1.attacker.com) already has the least significant bit 0, and statistically speaking, the expected length of the required chain is 2 (up to a small quantity due to the cutoff at chain length 15).
The above practice is called CNAME chaining2. While it is probably the easiest to explain, other methods (possibly better, in some aspects) of forcing a DNS caching server to send multiple queries are discussed later in this document.
Note that the BIND 9 DNS server handles CNAME chains (up to 16 "redirections") well, but will only return the first 15 CNAME records (i.e. the 16th CNAME will not be included in the response returned to the client). Therefore, when the chain contains up to (and including) 15 redirections, the response to the client will be functional, i.e. will include the IP address of the final CNAME.
Assuming the attacker received a query whose transaction ID is even and the attacker then redirected to www.example.com, the second phase begins. The attacker needs to prepare the 10 possible DNS answers, corresponding to the 10 possible transaction ID values (as described above), and with the same UDP destination port (which is copied from the query source port), with source port 53, destination IP address being the request's source IP address, and the source IP address should be that of the name server for the .COM gTLD (which will be queried by the DNS caching name server for the www.example.com resolution).
The attacker can start sending those 10 DNS responses, as rapidly as possible, cycling through them again and again. Even with a modest 256Kbit uplink and with even 150 bytes per response it is possible to complete a cycle in less than 50 milliseconds. This increases the likelihood that the spoofed response (from the attacker's server) will reach the DNS server before the genuine DNS response (from the gTLD server).
Note that in order to maximize the likelihood of the attack to succeed, the attacker may order the transaction ID values used in the DNS responses, such that the high probability values (the two values associated with least significant bits being 0) are transmitted first.
The Perl script in Appendix B demonstrates the preparation of the candidate transaction IDs. It takes one command line argument (the current transaction ID, expressed as 4 hexadecimal digits, and is supposed to have least significant bit 0) and it prints the 10 possible next transaction ID values (the two most likely values are printed first).
2CNAME chains are discouraged per the DNS RFC 1034 (), section 3.6.2. Indeed, "standard" name servers eliminate such indirections from a static DNS configuration by resolving CNAME chains internally and providing a consolidated result. At the same time, CNAME chaining is in use by many good and respectable domains, e.g. when a domain uses Content Delivery Network (CDN) services it typically points at the CDN host (on a different domain) via a CNAME record. Therefore, to implement the above CNAME chain it is advised to use a name server which provides user-controllable runtime configuration, such as .
2.3 An advanced attack: full PRNG state reconstruction
A shortcoming of the basic attack is that it provides 10 candidates for the next transaction ID. Also, it cannot predict sequences of transaction IDs. It merely uses an obvious weakness in the PRNG scheme to predict the next value in half the cases. However, since the BIND 9 PRNG is weak, it is also feasible to completely predict it (i.e. to reproduce its internal state in fullness). For this, a sequence of 13-15 consecutive DNS queries is needed (possibly using the CNAME chaining technique described above).
An algorithm that reconstructs the state of the two LFSRs after the first entry of the transaction ID sequence is generated, is as following (using straightforward and well known cryptanalysis techniques):
- Guess the 6-7 least significant bits of the first LFSR (hereinafter the state assume is always the state right after the first transaction ID in the sequence is generated). Since the first transaction ID is the XOR of the least significant 16 bits of the two LFSRs, it immediately follows that the 6-7 (respectively) bits of the second LFSR become known.
- Per each such guess (there are 64/128 such guesses, respectively), advance the LFSRs and observe the XOR of their results, while all the time keeping in mind that as the registers advance, the "window of known bits" shrinks. Each register has its own window (since they not necessarily advance at the same pace), but since the least significant bits are known (for few steps, at least), the way they advance is completely known. This can be used to eliminate wrong guesses. At the end of this process, it is expected that very few candidates remain.
- Per each remaining candidate, try guessing alternately another bit of the first LFSR, and possibly eliminate using the above technique (following the LFSRs as they advance), then do so for the second LFSR, alternating between the two. Usually (when 13 or more transaction IDs are available), it is possible to improve by at least one bit per iteration, but occasionally there's no escape from guessing the bit and moving on.
- When one of the registers is fully known (all 32 bits) it can be followed "forever" (its "window" becomes infinite). When the two LFSRs are fully known, the internal state has been completely reconstructed.
Note that since each shift register advances once or twice per transaction ID, it follows that it takes 8-16 advances to get the most significant bit of each register to appear in the transaction ID. Because the algorithm above uses the state after the first transaction ID as its initial state, the algorithm actually requires at least 9-17 consecutive queries to fully reconstruct the internal state ("at least", because if say both registers advance by exactly 16 steps, the most significant bits will only be observed XORed with each other, hence one bit of information will still be missing). The exact number depends on the advancement schedule of both registers, but the probability for a success within m+1 consecutive queries can be easily bounded from above by the probability of the minimum of two binomial random variables variables m+B(m,½) to be >= 16 (keep in mind that the advancement is 1+B(1,½)), and this bound is quite close to the actual probability of success. It can easily be seen that good results are therefore expected when m=12 (13 queries), and excellent ones when m=14 (15 queries).
The Perl script in Appendix C takes around 10-15 milliseconds (on IBM ThinkPad T60 laptop with Intel Centrino CoreDuo T2400 CPU @1.83GHz and Windows XP SP2 operating system – certainly a moderately powered machine) to extract the internal state from 13-15 consecutive transaction IDs. It takes one command line argument – the name of its input file. This file is assumed to contain lines, where each line describes a single DNS query (4 hex digits for the transaction ID). A file in this format can be produced from a PDML file (one of the export formats of the WireShark protocol analyzer) using the XSL transformation in Appendix A.
Rewriting the algorithm in a compiled language (e.g. C/C++) is expected to yield at least an order of magnitude improvement in performance, thus getting it to run in around 1-2 milliseconds (or less).
2.4 Attack variants
2.4.1 Pre-computed table
The basic attack algorithm calculates the 10 candidates in run time, given the current transaction ID (provided it is even). Another approach can be to pre-calculate a table for all (even) transaction IDs, and per each list all 10 candidates. Such table has 215 entries (since there are 215 even transaction IDs), and each entry is a list of 10 candidates, i.e. ten 16 bit quantities (20 bytes altogether). Thus the total storage needed for this table is 640KB. Generating this table takes less than half a second with a Perl script, so it should probably take few dozen milliseconds (or less) in native C/C++ code.
2.4.2 Information theoretic results
Experiments with the full PRNG state reconstruction script revealed that typically when there are less than 13-15 known transaction IDs, more than one internal state candidate is found. All candidates generate the same transaction ID sequence, and hence are indiscernible from one another. This means that indeed typically around 13-15 transaction IDs are indeed necessary (theoretically!) to reconstruct the internal state, or in other words, that the above algorithm (and script) are optimal from an information theoretic aspect.
2.4.3 Linear equations
Note that the PRNG state reconstruction algorithm makes use of incremental enumeration and elimination, with basis guess of 6-7 bits. An alternative approach is to represent the information as linear equations (while taking into account the non-uniform advance in the registers). Again – this is a well known cryptanalytic technique for attacking such a system. However, in this case it seems that guessing and elimination is faster than solving the set of equations.
2.4.4 Earlier versions of BIND 9
With versions of BIND 9 earlier than 9.2.3rc1, the shift register taps are slightly different (the bug fix introduced in 9.2.3rc1 amounts to changing the tap of the second shift register, as well as changing the way the tap is interpreted in both registers, but the underlying algorithm was not modified). Both attacks described above should work for earlier versions of BIND 9 (though this was not explicitly tested), with the following tap values:
$tap1=0xc000002b; # (0x80000057>>1)|(1<<31) tap2="0xc0000061;">>1)|(1<<31) release="9.4.1" dmode="source&hl="en" release="9.2.3" target="DNS.com-net" pkg="bind9/9.4.1/bind-9.4.1.tar.gz&name="BIND%209.4.1%20Source" src="="…" response="="0">