Logging SVC/SVM training to log file

I am trying to save the output from sklearn.smv.SVC training when verbose=True to a log-file. However, since it uses LibSVM in the back-end, I cannot figure out how this works. Copilot hasn't helped.

Here's a brief example. It isn't the exact problem I am trying to solve or the workflow, but gives the idea:

import numpy as np
import sklearn
import os

if __name__ == '__main__':
    breast_data = sklearn.datasets.load_breast_cancer()

    X = breast_data.data
    y = breast_data.target
   
    np_rand_state = np.random.RandomState(0)

    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.33, random_state=np_rand_state)

    model = sklearn.svm.SVC(verbose=True)
    model.fit(X_train, y_train)

The console output is here from the model.fit():

*
optimization finished, #iter = 79
obj = -100.327399, rho = -0.702443
nSV = 114, nBSV = 109
Total nSV = 114

I want to save the console output to a log-file, using the integrated python logging functionality

(logging). The output the console is not done by a simple print statement, but through the SVM backend from sklearn.svm.SVC. This means it is not as simple as redirecting the print to a log file.

Solution

The verbose=True output from sklearn.svm.SVC comes from the underlying LibSVM C library, not from Python’s print() or logging.
That means the messages are written at the C level to stdout, so regular Python logging or contextlib.redirect_stdout won’t capture them.

To log that output, you need to temporarily redirect the C-level stdout.
The cleanest and most reliable way to do that is with the wurlitzer package, which safely captures both C and Python output streams.

Example using `wurlitzer`

import numpy as np
import sklearn
from wurlitzer import pipes

breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.33, random_state=0
)

model = sklearn.svm.SVC(verbose=True)

# Capture LibSVM output into a file
with open("svm_training.log", "w") as f, pipes(stdout=f, stderr=f):
    model.fit(X_train, y_train)

This will write all the training messages (the ones normally printed to the console, like optimization finished, #iter = ...) to svm_training.log.

If you can’t install `wurlitzer`

You can do a manual redirect of the underlying file descriptor:

import os
import numpy as np
import sklearn

breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.33, random_state=0
)

model = sklearn.svm.SVC(verbose=True)

with open("svm_training.log", "w") as f:
    old_stdout = os.dup(1)
    os.dup2(f.fileno(), 1)
    try:
        model.fit(X_train, y_train)
    finally:
        os.dup2(old_stdout, 1)
        os.close(old_stdout)

Logging SVC/SVM training to log file

Example using wurlitzer

If you can’t install wurlitzer

Example using `wurlitzer`

If you can’t install `wurlitzer`