The first step of the find node operation is as follows (as described in the paper):
The lookup initiator starts by picking α nodes from its closest non-empty k-bucket (or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of).
Why does it pick the elements directly from the bucket, as opposed to looking for k
closest elements across all elements in all buckets? I believe the latter is what happens in step 2 of the algorithm, and can be seen in the visualization here.
I guess this is simply under the assumption that α ≤ k. Under that condition you will get the k closest nodes automatically from the closest bucket, or if the bucket contains fewer than α nodes the bracketed condition will apply
(or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of)
Also note that you're looking at the pre-proceedings version of the paper, which does not contain the full kademlia description. You can find the full paper here.