javarapidminer

How to convert ArrayList to ExampleSet in Rapidminer?


I'm creating an extension for rapidminer using java. I have an array of elements of type Example and I need to covert it to a dataset of type ExampleSet.

Rapidminer's ExampleSet definition looks like this:

public interface ExampleSet extends ResultObject, Cloneable, Iterable<Example>

I need to pick certain elements from dataset and send it back, still as ExampleSet, however casting is not working and I can't simply create new ExampleSet object since it's an interface.

private ExampleSet generateSet(ExampleSet dataset){
    List<Example> list = new ArrayList<Example>();
    // pick elements from sent dataset and add them to newly created list above
    return (ExampleSet)list;
}

Solution

  • You will need more than a simple explicit cast. In RapidMiner, an ExampleSet is not just a collection of Example. It contains more complex information and logic.

    Therefore, you need another approach to work with ExampleSets. Like you already said, it is just the interface, which lead us to choice of the right subtype.

    For starters, (Since: 7.3) simply use one of ExampleSets class's methods .
    You also need to define each Attribute this ExampleSet is going to have, namely the columns.

    Below, I create one with a single Attribute called First

    Attribute attributeFirst = AttributeFactory.createAttribute("First", Ontology.POLYNOMINAL);
    ExampleSetBuilder builder = ExampleSets.from(attributeFirst);
    builder.addDataRow(example.getDataRow());
    ExampleSet result = builder.build();
    

    You can also get the Attributes in a more generic way using:

    Attribute[] attributes = example.getAttributes().createRegularAttributeArray();
    ExampleSetBuilder builder = ExampleSets.from(attributes);
    ...
    

    If you have many cases where you have to create or alter ExampleSet, I encourage you to write your own ExampleSetBuilder since the original implementation have many drawbacks.

    You can also try searching for other extensions, which may already meet your requirements, and you do not need to create one of your own (belive me, it's not Headache-free).