I am trying to save the output from sklearn.smv.SVC training when verbose=True to a log-file. However, since it uses LibSVM in the back-end, I cannot figure out how this works. Copilot hasn't helped.
Here's a brief example. It isn't the exact problem I am trying to solve or the workflow, but gives the idea:
import numpy as np
import sklearn
import os
if __name__ == '__main__':
breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target
np_rand_state = np.random.RandomState(0)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.33, random_state=np_rand_state)
model = sklearn.svm.SVC(verbose=True)
model.fit(X_train, y_train)
The console output is here from the model.fit():
*
optimization finished, #iter = 79
obj = -100.327399, rho = -0.702443
nSV = 114, nBSV = 109
Total nSV = 114
I want to save the console output to a log-file, using the integrated python logging functionality
(logging). The output the console is not done by a simple print statement, but through the SVM backend from sklearn.svm.SVC. This means it is not as simple as redirecting the print to a log file.
The verbose=True output from sklearn.svm.SVC comes from the underlying LibSVM C library, not from Python’s print() or logging.
That means the messages are written at the C level to stdout, so regular Python logging or contextlib.redirect_stdout won’t capture them.
To log that output, you need to temporarily redirect the C-level stdout.
The cleanest and most reliable way to do that is with the wurlitzer package, which safely captures both C and Python output streams.
wurlitzerimport numpy as np
import sklearn
from wurlitzer import pipes
breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, test_size=0.33, random_state=0
)
model = sklearn.svm.SVC(verbose=True)
# Capture LibSVM output into a file
with open("svm_training.log", "w") as f, pipes(stdout=f, stderr=f):
model.fit(X_train, y_train)
This will write all the training messages (the ones normally printed to the console, like optimization finished, #iter = ...) to svm_training.log.
wurlitzerYou can do a manual redirect of the underlying file descriptor:
import os
import numpy as np
import sklearn
breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, test_size=0.33, random_state=0
)
model = sklearn.svm.SVC(verbose=True)
with open("svm_training.log", "w") as f:
old_stdout = os.dup(1)
os.dup2(f.fileno(), 1)
try:
model.fit(X_train, y_train)
finally:
os.dup2(old_stdout, 1)
os.close(old_stdout)