I'm using python-weka-wrapper3. I have just loaded an arff dataset
kc1_class_arff = arff_loader(DATA_PATH, "/kc1_class.arff")
The last column of this dataset is named NUMDEFECTS
and contains float.
I need this column to be renamed as DEFECTS
and be turned to integers:
The function arff_loader is the following:
def arff_loader(DATA_PATH, file_name):
data = loader.load_file(DATA_PATH + file_name)
data.class_is_last()
return data
You can achieve this by creating a short filter pipeline:
For renaming the attribute, you can use the RenameAttribute filter.
For turning the numeric attribute into an indicator attribute, you can use the MathExpression filter.
For combining multiple filters, use MultiFilter.
Assuming an input file like:
@relation defects
@attribute x1 numeric
@attribute x2 numeric
@attribute NUMDEFECTS numeric
@data
0,1,12.0
1,1,25.1
0,0,5.0
1,0,0.0
-1,0,-10.0
You can apply this python-weka-wrapper3 code:
import weka.core.jvm as jvm
from weka.core.converters import load_any_file
from weka.filters import Filter, MultiFilter
jvm.start()
# load data
data = load_any_file("./defects.arff")
# rename
rename = Filter(classname="weka.filters.unsupervised.attribute.RenameAttribute",
options=["-find", "NUMDEFECTS", "-replace", "DEFECTS"])
# float -> indicator
# NB: class attribute must unset for this filter to work!
indicator = Filter(classname="weka.filters.unsupervised.attribute.MathExpression",
options=["-E", "ifelse(A < 0, 1, ifelse(A > 0, 1, 0))", "-R", "last", "-V"])
multi = MultiFilter()
multi.filters = [rename, indicator]
multi.inputformat(data)
filtered = multi.filter(data)
filtered.relationname = data.relationname
print(filtered)
jvm.stop()
And you will get something like this:
@relation defects
@attribute x1 numeric
@attribute x2 numeric
@attribute DEFECTS numeric
@data
0,1,1
1,1,1
0,0,1
1,0,0
-1,0,1
Once you have obtained the filtered data, you can set the class attribute and return it in your arff_loader
method.