pythonbashgensimldamallet

How do I pass a file path containing spaces to the Gensim LDA Mallet wrapper?


I am attempting to use Gensim's Mallet wrapper. When I run the following code:

import os
import gensim

os.environ.update({
        'MALLET_HOME':
        r":C\Users\me\OneDrive - My Company\Documents\Projects\Current\mallet-2.0.8"
    })
lda_mallet = gensim.models.wrappers.LdaMallet(
        r"C:\Users\me\OneDrive - My Company\Documents\Projects\Current\mallet-2.0.8\bin\mallet",
        corpus=corpus,
        num_topics=10,
        id2word=id_dict)

I am thrown the following errors:

'C:\Users\me\OneDrive' is not recognized as an internal or external command,
operable program or batch file.

subprocess.CalledProcessError: Command 'C:\Users\me\OneDrive - My Company\Documents\Projects\Current\mallet-2.0.8\bin\mallet import-file --preserve-case --keep-sequence --remove-stopwords --token-regex "\S+" --input C:\Users\me\AppData\Local\Temp\17fe21_corpus.txt --output C:\Users\me\AppData\Local\Temp\17fe21_corpus.mallet' returned non-zero exit status 1.

After exhaustive online searches, I have found many proposed solutions that unfortunately do not resolve my issue.

Since the first error message does not print the entire path, I believe the spaces are the cause of the issue.

Unfortunately, my company requires that I use this directory and I cannot change the name. Is there a way to "escape" the spaces in order to run my code?


Solution

  • Well, that's easy, LdaMallet class is a badly written piece of software, report this as a bug to its creators.