I study papers on link prediction in knowledge networks. Authors usually report "Hits@k". I wonder how to calculate hits@k and what does it mean about the model and the results?
In a nutshell, it is the count of how many positive triples are ranked in the top-n positions against a bunch of synthetic negatives.
In the following example, pretend the test set includes two ground truth positive only:
Jack born_in Italy
Jack friend_with Thomas
Let's assume such positive triples (identified by * below) are ranked against four synthetic negatives each.
Now, assign a score to each of the positives and its synthetic negatives using your pre-trained embedding model. Then, sort the triples in descending order. In the example below, the first triple ranks 2nd, and the other triple ranks first (against their respective synthetic negatives):
s p o score rank
Jack born_in Ireland 0.789 1
Jack born_in Italy 0.753 2 *
Jack born_in Germany 0.695 3
Jack born_in China 0.456 4
Jack born_in Thomas 0.234 5
s p o score rank
Jack friend_with Thomas 0.901 1 *
Jack friend_with China 0.345 2
Jack friend_with Italy 0.293 3
Jack friend_with Ireland 0.201 4
Jack friend_with Germany 0.156 5
Then, count how many positives occur in the top-1 or top-3 positions, and divide by the number of triples in the test set (which in this example includes 2 triples):
Hits@3= 2/2 = 1.0
Hits@1= 1/2 = 0.5
AmpliGraph has an API to compute Hits@n - check out the documentation here.