pythonscikit-learn

Logging SVC/SVM training to log file


I am trying to save the output from sklearn.smv.SVC training when verbose=True to a log-file. However, since it uses LibSVM in the back-end, I cannot figure out how this works. Copilot hasn't helped.

Here's a brief example. It isn't the exact problem I am trying to solve or the workflow, but gives the idea:

import numpy as np
import sklearn
import os

if __name__ == '__main__':
    breast_data = sklearn.datasets.load_breast_cancer()

    X = breast_data.data
    y = breast_data.target
   
    np_rand_state = np.random.RandomState(0)

    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.33, random_state=np_rand_state)

    model = sklearn.svm.SVC(verbose=True)
    model.fit(X_train, y_train)

The console output is here from the model.fit():

*
optimization finished, #iter = 79
obj = -100.327399, rho = -0.702443
nSV = 114, nBSV = 109
Total nSV = 114

I want to save the console output to a log-file, using the integrated python logging functionality

(logging). The output the console is not done by a simple print statement, but through the SVM backend from sklearn.svm.SVC. This means it is not as simple as redirecting the print to a log file.


Solution

  • The verbose=True output from sklearn.svm.SVC comes from the underlying LibSVM C library, not from Python’s print() or logging.
    That means the messages are written at the C level to stdout, so regular Python logging or contextlib.redirect_stdout won’t capture them.

    To log that output, you need to temporarily redirect the C-level stdout.
    The cleanest and most reliable way to do that is with the wurlitzer package, which safely captures both C and Python output streams.


    Example using wurlitzer

    import numpy as np
    import sklearn
    from wurlitzer import pipes
    
    breast_data = sklearn.datasets.load_breast_cancer()
    X = breast_data.data
    y = breast_data.target
    
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
        X, y, test_size=0.33, random_state=0
    )
    
    model = sklearn.svm.SVC(verbose=True)
    
    # Capture LibSVM output into a file
    with open("svm_training.log", "w") as f, pipes(stdout=f, stderr=f):
        model.fit(X_train, y_train)
    

    This will write all the training messages (the ones normally printed to the console, like optimization finished, #iter = ...) to svm_training.log.


    If you can’t install wurlitzer

    You can do a manual redirect of the underlying file descriptor:

    import os
    import numpy as np
    import sklearn
    
    breast_data = sklearn.datasets.load_breast_cancer()
    X = breast_data.data
    y = breast_data.target
    
    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
        X, y, test_size=0.33, random_state=0
    )
    
    model = sklearn.svm.SVC(verbose=True)
    
    with open("svm_training.log", "w") as f:
        old_stdout = os.dup(1)
        os.dup2(f.fileno(), 1)
        try:
            model.fit(X_train, y_train)
        finally:
            os.dup2(old_stdout, 1)
            os.close(old_stdout)