I am looking for a way to measure the semantic distance between two sentences. Suppose we have the following sentences:
(S1) The beautiful cherry blossoms in Japan.
(S2) The beautiful Japan.
S2 is created from S1 by eliminating the words "cherry", "blossoms" and "in". I want to define a function that gives a high distance between S1 and S2. The reason for this is that they do have significantly different meaning, since beautiful modifies cherry blossoms and not Japan.
I think that research has made a lot of advances in that area and now the distance between the meaning of sentences can be calculated via several methods thanks to the development of word vectors and transformers:
Google universal sentence encoder (USE): https://tfhub.dev/google/universal-sentence-encoder/2
Infersent by facebook: https://github.com/facebookresearch/InferSent
Averaging the word vectors (with cosine similarity).
Spacy also provide a similarity between two sentences based on word vectors: https://spacy.io/usage/spacy-101
etc