How can I implement hill climbing using DRL exclusively?

I'm trying to build a local search algorithm using rules (DRL) only in order to always apply the "best" possible rule out of a rule-base at a given point in time which maximizes a fitness value. Therefore, I need to simulate the application of the rule within its preconditions and check whether its application would lead to the best possible outcome out of all available rules. I need to keep it in DRL exclusively, so I cannot use planners like OptaPlanner.

So in theory, the preconditions of every rule should be evaluated first before a rule is fired. Therefore, I had the idea of introducing an object *Test* to the fact base, which simply stores the current best fitness value along with the corresponding rule name. Additionally, it has a boolean flag which checks, whether all rules have been already evaluated. To every regular rule I added the condition that all rules would have to be checked first (isAllRulesChecked()) before a rule can be fired, as the evaluation of a rule would trigger predicates that simulate the application of the rule and store its fitness value along with the rule ID in the test-object if it's a new best. An additional rule "allRulesChecked" with negative salience should be checked last, so I used it to toggle the boolean flag "isAllRulesChecked()" as by the time this rule is reached, the test-object should contain the rule with the highest fitness value. Finally, the last condition for a rule to fire is the fact that it is marked as the best applicable rule in the test-object (isbestRule()).

rule "1"
    when
        //Patterns that compute fitness and return true here...

        $t: Test()

        //Update $t if rule is new best
        eval($t.isAllRulesChecked())
        eval($t.isBestRule("1"))
        
    then
        System.out.println("Rule 1 fired!");
end

rule "2"
    when
        //Patterns that compute fitness and return true here...

        $t: Test()

        //Update $t if rule is new best
        eval($t.isAllRulesChecked())
        eval($t.isBestRule("2"))
        
    then
        System.out.println("Rule 2 fired!");
end

rule "3"
    when
        //Patterns that compute fitness and return true here...

        $t: Test()

        //Update $t if rule is new best
        eval($t.isAllRulesChecked())
        eval($t.isBestRule("3"))
        
    then
        System.out.println("Rule 3 fired!");
end


rule "allRulesChecked"
    when
        $t: Test(!isAllRulesChecked())
    then
        modify($t){setAllRulesChecked(true)};
end

This snippet above shows my conceptual idea, however I'm afraid my thinking is flawed as I didn't manage to trigger what I expected.

So let's say:

Rule 1 would yield fitness of 0.3
Rule 2 would yield fitness of -0.1
Rule 3 would yield fitness of 0.4

Then I would want to only fire rule 3 and start all over again, as the execution of rule 3 would cause the fitness values to change.

I hope my intentions are understandable, and I would be very grateful for hints about how I could tackle this problem or where I'm going wrong.

Solution

This is a very interesting use-case!
This is a draft of a workable solution, given what you shared so far.

In essence I believe you want to separate the steps of "evaluate fitness(es)" and "apply best fitness".

The problem in your original approach is that you are trying to perform side-effect just by the LHS, which is wrong as mentioned in the manual, and using eval() which is also super-inefficient as mentioned too.

So I would strongly encourage you to first use rule to evaluate the Fitness Criterias and decide which one to apply as another phase:

rule "criteria 1"
agenda-group "evaluate fitness"
when
    //Patterns that compute fitness here...
then
    insertLogical( new Criteria("C1", 0.3d) );
end

rule "criteria 2"
agenda-group "evaluate fitness"
when
    //Patterns that compute fitness and return true here...
then
    insertLogical( new Criteria("C2", -0.1d) );
end

rule "criteria 3"
agenda-group "evaluate fitness"
when
    //Patterns that compute fitness and return true here...
then
    insertLogical( new Criteria("C3", 0.4d) );
end

rule "Select best criteria and evaluate side-effects"
when
    accumulate(
        Criteria($fit: fitness);
        $bestFit: max( $fit )
    )
    $best: Criteria( fitness == $bestFit )
then
    bestCriteria.add( $best );
    // evaluate side-effects, ensuring update callbacks
    // put again agenda-group in focus: drools.getKieRuntime().getAgenda().getAgendaGroup( "evaluate fitness" ).setFocus();
end

you can kick-start the session evaluation with

session.getAgenda().getAgendaGroup("evaluate fitness").setFocus();
session.fireAllRules();

after having inserted the necessary data.

When the agenda-group "evaluate fitness" is de-stacked, you have already criterias which are valid until side-effects are performed, so you can apply your rule to find "the best one" (here pictured as you requested, selecting the one with 0.4d).

You can perform side-effect by capturing them in a lambda of a criteria, don't forget modification to your domain model need to be governed by update to the engine.

You might consider re-stacking the agenda-group "evaluate fitness" on the agenda then, with the advantage if you wired everything correctly, the logical insert will automatically retract when no longer valid, so the rule for evaluating criteria when needed.

Don't forget we have mailing list and support forum if you want to engage with the Drools community further.