<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-stable</artifactId>
<version>3.8.5</version>
</dependency>
In this version, the SMOTE class is not kept, but I really need it; that's why I also added in my pom.xml
the following dependency:
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>SMOTE</artifactId>
<version>1.0.2</version>
</dependency>
In my Java code, i also try to develop the WalkForward
validation technique: I can prepare both training set and testing set for each step, so i can use them in a loop in which what I do is the following:
for (...){
var filtered = new FilteredClassifier();
var smote = new SMOTE();
filtered.setFilter(smote);
filtered.setClassifier(new NaiveBayes());
filtered.buildClassifier(trainingDataset);
var currEvaluation = new Evaluation(testingDataset);
currEvaluation.evaluateModel(filtered, testingDataset);
}
trainingDataset
and testingDataset
type is Instances
and their value changes appropriately in each iteration. In the first iteration, no problem occurs, but in the second one the java.lang.IllegalArgumentException: Comparison method violates its general contract!
is raised. The exception stack trace is:
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.base/java.util.TimSort.mergeLo(TimSort.java:781)
at java.base/java.util.TimSort.mergeAt(TimSort.java:518)
at java.base/java.util.TimSort.mergeCollapse(TimSort.java:448)
at java.base/java.util.TimSort.sort(TimSort.java:245)
at java.base/java.util.Arrays.sort(Arrays.java:1441)
at java.base/java.util.List.sort(List.java:506)
at java.base/java.util.Collections.sort(Collections.java:179)
at weka.filters.supervised.instance.SMOTE.doSMOTE(SMOTE.java:637)
at weka.filters.supervised.instance.SMOTE.batchFinished(SMOTE.java:489)
at weka.filters.Filter.useFilter(Filter.java:708)
at weka.classifiers.meta.FilteredClassifier.setUp(FilteredClassifier.java:719)
at weka.classifiers.meta.FilteredClassifier.buildClassifier(FilteredClassifier.java:794)
Does anyone know how to solve the problem?
Thanks in advance.
EDIT: I forgot to say I'm using java 11.0.11
.
EDIT 2: Based on the @fracpete answer, I deduce that the problem may be the sets creation. I state that I'm trying to predict bugginess of classes of another opensource project. Because of Walk Forward
, I have 19 steps and should have 19 different training files and 19 testing files. To avoid this, I have a list of class InfoKeeper
which keeps Instances for train and test for each step. During the creation of this array, i do the following:
InfoKeeper
related on step 1. InfoKeeper
related on step 2. The code iterates on step 2 to create all the remaining InfoKeeper
. May this operation be the problem?
I also tried to use @frecpete snippet, but the same error occurs. The files I used are the following:
training set file
testing set file
EDIT 3: this is how I compute files:
public class FilesCreator {
private File basicArff;
private Instances totalData;
private ArrayList<Instance> testingInstances;
private File testingSet;
private File trainingSet;
/* *******************************************************************/
public FilesCreator(File csvFile, File arffFile, File training, File testing)
throws IOException {
var loader = new CSVLoader();
loader.setSource(csvFile);
this.totalData = loader.getDataSet(); // get instances object
this.basicArff = arffFile;
this.testingSet = testing;
this.trainingSet = training;
}
private ArrayList<Attribute> getAttributesList(){
var attributes = new ArrayList<Attribute>();
int i;
for (i = 0; i < this.totalData.numAttributes(); i++)
attributes.add(this.totalData.attribute(i));
return attributes;
}
private void writeHeader(PrintWriter pf) {
// just write the attributes in the given file.
// f is either this.testingSet or this.trainingSet
pf.append("@relation " + this.totalData.relationName() + "\n\n");
pf.flush();
var attributes = this.getAttributesList();
for (Attribute line : attributes){
pf.append(line.toString() + "\n");
pf.flush();
}
pf.append("\n@data\n");
pf.flush();
}
/* *******************************************************************/
/* testing file */
// testing instances
private void computeTestingSet(int indexRelease){
int i;
int currIndex;
// re-initialize the list
this.testingInstances = new ArrayList<>();
for (i = 0; i < this.totalData.numInstances(); i++){
// first attribute is the release index
currIndex = (int) this.totalData.instance(i).value(0);
if (currIndex == indexRelease)
testingInstances.add(this.totalData.instance(i));
else if (currIndex > indexRelease)
break;
}
}
// testing file
private void computeTestingFile(int indexRelease){
this.computeTestingSet(indexRelease);
try(var fp = new PrintWriter(this.testingSet)) {
this.writeHeader(fp);
for (Instance line : this.testingInstances){
fp.append(line.toString() + "\n");
fp.flush();
}
} catch (IOException e) {
var logger = Logger.getLogger(FilesCreator.class.getName());
logger.log(Level.OFF, Arrays.toString(e.getStackTrace()));
}
}
/* *******************************************************************/
// training file
private void computeTrainingFile(int indexRelease){
int i;
try(var fw = new FileWriter(this.trainingSet, true);
var fp = new PrintWriter(fw)) {
if (indexRelease == 1) {
// first iteration: need the header.
fp.print("");
fp.flush();
this.writeHeader(fp);
for (i = 0; i < this.totalData.numInstances(); i++) {
if ( (int) this.totalData.instance(i).value(0) > indexRelease)
break;
fp.append(this.totalData.instance(i).toString() + "\n");
fp.flush();
}
}
else {
// in this case just append the testing instances, which
// are the indexReleas+1-th data:
for (Instance obj : this.testingInstances){
fp.append(obj.toString() + "\n");
fp.flush();
}
}
} catch (IOException e) {
var logger = Logger.getLogger(FilesCreator.class.getName());
logger.log(Level.OFF, Arrays.toString(e.getStackTrace()));
}
}
/* *******************************************************************/
// public method
public void computeFiles(int indexRelease){
this.computeTrainingFile(indexRelease);
this.computeTestingFile(indexRelease + 1);
}
}
The last public method is invoked inside a loop of another class, starting from 1 to 19:
FilesCreator filesCreator = new FilesCreator(csvFile, arffFile, training, testing);
for (i = 1; i < 20; i++) {
filesCreator.computeFiles(i);
/* do something with files, such as getting Instances and
use them for SMOTE computation */
}
EDIT 4: I removed duplicated instances from totalData
in FilesCreator
by doing the following:
var currDir = Paths.get(".").toAbsolutePath().normalize().toFile();
var ext = ".arff";
var tmpFile = File.createTempFile("without_replicated", ext, currDir);
RemoveDuplicates.main(new String[]{"-i", this.basicArff.toPath().toString(), "-o", tmpFile.toPath().toString()});
// output file has effective 0 instances repetitions
var arffLoader = new ArffLoader();
arffLoader.setSource(tmpFile);
this.totalData = arffLoader.getDataSet();
Files.delete(tmpFile.toPath());
I cannot manually modify it because it's output of previous computation. The code works for iteration 2
, but get the same error for iteration 3
.
The files for this iteration are:
train_iteration4.arff
test_iteration4.arff
This is the very full arff file obtained by the previous snippet and it's the one which is loaded by arffLoader.setSource(tmpFile);
:
full.arff
I solved the problem changing the smote dependency in my pom.xml
in:
<dependency>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>SMOTE</artifactId>
<version>1.0.3</version>
</dependency>
In this version, I don't have any problem and my code runs as expected. Hope this will help others.