parametersword-embeddingfasttextoov

How to tune FastText parameter for OOV word?


I already heard that FastText is generating OOV word vectors using its n-gram's. It is already automatically built-in at FastText architecture or we should like to tune specific parameters to it? like an oov_tokens in Keras tokenizer. I already looking for what parameters to tune in Fast Text but I couldn't find any.

If anyone knows and wants to share their knowledge I would be very appreciative of that.

Thank you.


Solution

  • Vector generation for OOV words is integrated into fastText (at least in the original implementation by Facebook).

    To generate these vectors, fastText uses subword n-grams. To learn more, you can read this thread and this visual guide.

    For this reason, the parameters that most influence the creation of vectors for OOV words are the following:

    For more information about fastText options/parameters, see the official documentation.