I understand that the Kademlia routing table is made up of 160 buckets.
Nodes are put into buckets 0-159, depending on their prefix length (which is the number of leading unset bits in the XOR of the local node key and the node).
Why is this so, is there any performance benefits involved (other than the fact that iterating through 160*20 nodes to find the closest is infeasible)?.
Kademlia uses the XOR of 2 node IDs as a measure of their distance apart. The idea with the routing table buckets is that a node has detailed knowledge of the network "close" to it and less knowledge the further you get from its ID.
To find a group of nodes closest to a particular key, a node would first query the closest ones it knows about from its own routing table. On average, half of these lookups will fall into the address space covered by bucket 0. But that bucket is only allowed to contain 20 nodes, so there is little chance that these nodes are the actual closest. However, each node in that bucket will have more detailed knowledge of that part of the address space, and will likely be able to provide from its own routing table a better list of close nodes, which can then also be queried, and so on.
In this way, an iterative lookup very quickly hones in on the actual closest group of nodes.
When you say
iterating through 160*20 nodes to find the closest is infeasible
I take it you mean that actually querying each of them would be infeasible; since iterating through such a list to extract the closest ones is exactly how lookup requests (RPCs) are handled by the nodes.
Note that in real-world scenarios, it would be very unlikely for the number of buckets to get anywhere near 160. For example, for a network of a billion nodes, the average bucket count would be 27.
As to the actual values chosen in the original Kademlia paper, I don't know why the bucket size is specified as 20. However, I imagine that 160 is derived from the bit-size of a SHA1 hash. Normally a hash function is used to generate the key for a given value to be stored. If the risk of a hash-collision using SHA1 is tolerably low, then this is fine. If not, a different hash algorithm could be used, e.g. SHA256 would yield up to 256 buckets.