javacsvwekaarff

how to specify nominal attribute value's order when converting csv file into arff file?


I'm trying to convert a csv file into an arff file using the following code.

var csvFile = new File("/path/to/input/file.csv");
var arffOutputFile = new File("/path/to/output/file.arff");
var loader = new CSVLoader();
loader.setSource(csvFile);
var instances = loader.getDataSet();
var saver = new ArffSaver();
saver.setInstances(instances);
saver.setFile(arffOutputFile);
saver.writeBatch();

This code works, but the problem is the following. In my attributes list, I have a nominal attribute with values {yes, no} and i need that the arff header shows as first value yes. To be clearer, I need @attribute nominal_attr {yes,no} and not @attribute nominal_attr {no,yes} in the arff output header. The problem is that the order is determined by the value of the first Instance in instances: if the first row in csv input file has the no value, in the header there will be @attribute nominal_attr {no,yes}.
Is there a way to force the ArffSaver to use a certain order in the header without changing the order of the Instances?


Solution

  • Instead of fixing the output (ie ArffSaver), it would be easier fixing the input (ie CSVLoader). The -L command-line option (nominalLabelSpecs property in the GUI) allows you to specify the labels for nominal attributes. That way, you can force the order and available labels, if one of the CSV files doesn't have all the labels present.

    The following filters can be used as well to change the order of your labels: