I am currently trying to clone the popular browser game contexto.me and I am having trouble with as to how to calculate the similarity score between two words (the target word and the user inputted guess word). I am able to get the cosine similarity between the two words, but as to how to properly quantify the score into a clean integer like in the game, I am confused as to how it is done.
For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.
target_word = "helicopter"
glove = torchtext.vocab.GloVe(name="6B", dim=100)
@app.route('/', methods=["GET", "POST"])
def getSimScore():
if request.method == "POST":
text = request.form.get("word")
new_text = singularize(text)
sim_score = ((torch.cosine_similarity(glove[target_word].unsqueeze(0), glove[new_text].unsqueeze(0))).numpy()[0])
print(sim_score)
return render_template('homepage.html', messageText='sample text', gameNum=1, guessNum=1, wordAccuracy=999)
This is my code so far with sim_score printing to be ~0.77 for the input 'truck' and ~0.29 for the input 'king' (closer to 1 the more similar the word is to the target word).
For example, if the target word is 'helicopter' and I guess the word plane, contexto will return something like a similarity score of 13, but if I guess a word like 'king' contexto will return a score of '2000' for instance.
This metric is typically called "rank," and you can calculate it with the following algorithm.
For speed, steps 1 and 2 can be computed ahead of time, if you want.