pythonscikit-learnsvmlight

Overflow error with load_svmlight_file() from sklearn


I've created a SVMlight file with only one line from a pandas dataframe:

from sklearn.datasets import load_svmlight_file from sklearn.datasets import dump_svmlight_file
dump_svmlight_file(toy_data.drop(["Output"], axis=1),toy_data['Output'],"../data/oneline_pid.txt", query_id=toy_data['EventID'])

The result in the file looks like this:

0 qid:72048431380967004 0:1440446648 1:72048431380967004 2:236784985 3:1477 4:26889 5:22 6:36685162242798766 8:1919947 10:22 11:48985 12:1840689

When I try to load the file with query_id=True I get an overflow error.

train = load_svmlight_file("../data/oneline_pid.txt", dtype=np.uint64, query_id=True)

OverflowError: signed integer is greater than maximum

If I load the file with query_id=False there appears no error message but the value for the query_id is wrong. This is the output:

[[       1440446648 72048431380967008         236784985              1477
              26889                22 36685162242798768                 0
            1919947                 0                22             48985
            1840689]]

72048431380967004 appears now as 72048431380967008.

How do I avoid this error, the maximum value of np.uint64 is 9223372036854775807 so there should be no overflow error.

Have tried to load with np.int64 as data type too, but the output is the same.

Scikit-learn version: 0.16.1 OS X Yosemite 10.10.5


Solution

  • The overflow error was fixed for newer scikit-versions.