python-3.xscikit-learnlibsvmsvmlight

How to specify feature name for sklearn dump_svmlight_file in python?


Docs: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html

svmlight follows the data format:

<target> <feature:value> <feature:value>

With the data:

a = [[1,2,3],[4,5,6]]
b = [8,9]

Running the command:

dump_svmlight_file(a,b,'test.txt')

Outputs the following:

8 0:1 1:2 2:3
9 0:4 1:5 2:6

I would like to know if there is a way to specify the feature name rather than have it increment from 0, I would like to have something like the following as my result:

1 10:5 50:15 100:50
0 10:15 25:5 75:15
1 20:5 40:5 60:5

Does the dump_svmlight_file command have such a capability?


Solution

  • No. dump_svmlight_file does not have that option built in. Source code

    You can just specify whether the feature names should start at 0 or 1 using the parameter zero_based. Documentation

    I would suggest you not to try dump the file with actual feature names, which would unnecessarily increase size of the file. Instead pickle your feature names as a separate one and then join them.