As per Section 3.2 in the original paper on Fasttext, the authors state:
In order to bound the memory requirements of our model, we use a hashing function that maps n-grams to integers in 1 to K
Does this mean the model computes only K embeddings regardless of the number of distinct ngrams extracted from the training corpus, and if 2 different ngrams collide when hashed, they share the same embedding?
Thanks.
Yes: the character n-grams go into a collision-oblivious hash-table. Some slots will be trained for one n-gram once, then others.
In practice, the ones that are most-saliently meaningful tend to dominate their slot, & don't do too much damage to any collisions because of the many n-grams being combined per word. And, the OOV word synthesized -from-ngram vectors remain better-than-nothing.
You can change the number of buckets
at model-creation time via the similarly-named parameter, if your use is especially different (especially in overall training corpus size) from what drive their choice of the default. (IIRC, the default is 3 million bucket slots.)
You could potentially use fewer buckets to save memory – but then you'd have more collisions, and likely lower-quality OOV vectors.