pythonwindowsjupyter-notebookgensimmallet

Gensim mallet CalledProcessError: returned non-zero exit status


I'm getting an error while trying to access gensims mallet in jupyter notebooks. I have the specified file 'mallet' in the same folder as my notebook, but cant seem to access it. I tried routing to it from the C drive but I still get the same error. Please help :)

import os
from gensim.models.wrappers import LdaMallet

#os.environ.update({'MALLET_HOME':r'C:/Users/new_mallet/mallet-2.0.8/'})

mallet_path = 'mallet' # update this path

ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=bow_corpus, num_topics=20, id2word=dictionary)

result = (ldamallet.show_topics(num_topics=3, num_words=10,formatted=False))
for each in result:
    print (each)

Mallet Error CalledProcessError

enter image description here


Solution

  • Update the path to:

    mallet_path = 'C:/mallet/mallet-2.0.8/bin/mallet.bat'
    

    and edit the notepad mallet.bat within the mallet 2.0.8 folder to:

    @echo off
    
    rem This batch file serves as a wrapper for several
    rem  MALLET command line tools.
    
    if not "%MALLET_HOME%" == "" goto gotMalletHome
    
    echo MALLET requires an environment variable MALLET_HOME.
    goto :eof
    
    :gotMalletHome
    
    set MALLET_CLASSPATH=C:\mallet\mallet-2.0.8\class;C:\mallet\mallet-2.0.8\lib\mallet-deps.jar
    set MALLET_MEMORY=1G
    set MALLET_ENCODING=UTF-8
    
    set CMD=%1
    shift
    
    set CLASS=
    if "%CMD%"=="import-dir" set CLASS=cc.mallet.classify.tui.Text2Vectors
    if "%CMD%"=="import-file" set CLASS=cc.mallet.classify.tui.Csv2Vectors
    if "%CMD%"=="import-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Vectors
    if "%CMD%"=="info" set CLASS=cc.mallet.classify.tui.Vectors2Info
    if "%CMD%"=="train-classifier" set CLASS=cc.mallet.classify.tui.Vectors2Classify
    if "%CMD%"=="classify-dir" set CLASS=cc.mallet.classify.tui.Text2Classify
    if "%CMD%"=="classify-file" set CLASS=cc.mallet.classify.tui.Csv2Classify
    if "%CMD%"=="classify-svmlight" set CLASS=cc.mallet.classify.tui.SvmLight2Classify
    if "%CMD%"=="train-topics" set CLASS=cc.mallet.topics.tui.TopicTrainer
    if "%CMD%"=="infer-topics" set CLASS=cc.mallet.topics.tui.InferTopics
    if "%CMD%"=="evaluate-topics" set CLASS=cc.mallet.topics.tui.EvaluateTopics
    if "%CMD%"=="prune" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="split" set CLASS=cc.mallet.classify.tui.Vectors2Vectors
    if "%CMD%"=="bulk-load" set CLASS=cc.mallet.util.BulkLoader
    if "%CMD%"=="run" set CLASS=%1 & shift
    
    if not "%CLASS%" == "" goto gotClass
    
    echo Mallet 2.0 commands: 
    echo   import-dir        load the contents of a directory into mallet instances (one per file)
    echo   import-file       load a single file into mallet instances (one per line)
    echo   import-svmlight   load a single SVMLight format data file into mallet instances (one per line)
    echo   info              get information about Mallet instances
    echo   train-classifier  train a classifier from Mallet data files
    echo   classify-dir      classify data from a single file with a saved classifier
    echo   classify-file     classify the contents of a directory with a saved classifier
    echo   classify-svmlight classify data from a single file in SVMLight format
    echo   train-topics      train a topic model from Mallet data files
    echo   infer-topics      use a trained topic model to infer topics for new documents
    echo   evaluate-topics   estimate the probability of new documents given a trained model
    echo   prune             remove features based on frequency or information gain
    echo   split             divide data into testing, training, and validation portions
    echo   bulk-load         for big input files, efficiently prune vocabulary and import docs
    echo Include --help with any option for more information
    
    
    goto :eof
    
    :gotClass
    
    set MALLET_ARGS=
    
    :getArg
    
    if "%1"=="" goto run
    set MALLET_ARGS=%MALLET_ARGS% %1
    shift
    goto getArg
    
    :run
    
    "C:\Program Files\Java\jdk-12\bin\java" -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
    
    :eof
    

    in command line these were helpful commands to figure out what was going on:

    notepad mallet.bat
    java
    C:\Program Files\Java\jdk-12\bin\java
    dir /OD
    cd %userdir%
    cd %userpath%
    cd\
    cd users
    cd your_username
    cd appdata\local\temp\2
    dir /OD
    

    the problem is with java not being installed correctly or with the path not including java and the mallet classpath not being defined correctly. More info here: https://docs.oracle.com/javase/7/docs/technotes/tools/windows/classpath.html . This solved my error hopefully it helps someone else :)