According to the MPI standard the reorder parameter of the MPI_Cart_create routine might be used "possibly so as to choose a good embedding of the virtual topology onto the physical machine". However I was not able to find any information on how this is performed in OpenMPI or MPICH. Could anyone please explain how this reordering could take place and if it really provides an optimized virtual topology in any MPI implementation?
There are lots of ways that topology aware communicators could improve performance, but in reality, no implementation actually does this (as Jeff says in the comments).
In theory, an implementation could do something like arranging the ranks so that processes which are close in the physical topology (such as in the same socket/node/rack/etc.) would be close in rank as well. That would improve communication time because you would take less hops to communicate with the ranks that you communicate with most often.