I would like to understand how to implement neat python such that it retrains after each prediction made, therefore the training set increases in size after each prediction.
I'm trying to set up neat python through the configuration file to retrain after each prediction of the test/unseen set. For instance if the XOR "evolve-minimal" example, from my understanding it can be adjusted so that it trains on part of the data (to a particular fitness level, obtaining the best genome) then it predicts on the other data that was set aside as a test set. See the code below to see what I mean:
from __future__ import print_function
import neat
import visualize
# 2-input XOR inputs and expected outputs. Training set
xor_inputs = [(0.0, 0.0, 0.0), (0.0, 1.0, 0.0), (1.0, 1.0, 1.0), (0.0, 0.0, 1.0), (1.0, 1.0, 0.0)]
xor_outputs = [(1.0,), (1.0,), (1.0,), (0.0,), (0.0,)]
# Test set
xor_inputs2 = [(1.0, 0.0, 1.0), (1.0, 1.0, 0.0), (1.0, 0.0, 0.0)]
xor_outputs2 = [(1.0,), (0.0,), (0.0,)]
def eval_genomes(genomes, config):
for genome_id, genome in genomes:
genome.fitness = 5
net = neat.nn.FeedForwardNetwork.create(genome, config)
for xi, xo in zip(xor_inputs, xor_outputs):
output = net.activate(xi)
genome.fitness -= (output[0] - xo[0]) ** 2
# Load configuration.
config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
neat.DefaultSpeciesSet, neat.DefaultStagnation,
'config-feedforward')
# Create the population, which is the top-level object for a NEAT run.
p = neat.Population(config)
# Add a stdout reporter to show progress in the terminal.
p.add_reporter(neat.StdOutReporter(True))
stats = neat.StatisticsReporter()
p.add_reporter(stats)
# Run until a solution is found.
winner = p.run(eval_genomes)
# Display the winning genome.
print('\nBest genome:\n{!s}'.format(winner))
# Show output of the most fit genome against training data.
print('\nOutput:')
winner_net = neat.nn.FeedForwardNetwork.create(winner, config)
count = 0
#To make predictions using the best genome
for xi, xo in zip(xor_inputs2, xor_outputs2):
prediction = winner_net.activate(xi)
print(" input {!r}, expected output {!r}, got {!r}".format(
xi, xo[0], round(prediction[0])))
#to get prediction accuracy
if int(xo[0]) == int(round(prediction[0])):
count = count + 1
accuracy = count / len(xor_outputs2)
print('\nAccuracy: ', accuracy)
node_names = {-1: 'A', -2: 'B', 0: 'A XOR B'}
visualize.draw_net(config, winner, True, node_names=node_names)
visualize.plot_stats(stats, ylog=False, view=True)
visualize.plot_species(stats, view=True)
Config file is:
#--- parameters for the XOR-2 experiment ---#
[NEAT]
fitness_criterion = max
fitness_threshold = 4.8
pop_size = 150
reset_on_extinction = True
[DefaultGenome]
# node activation options
activation_default = sigmoid
activation_mutate_rate = 0.0
activation_options = sigmoid
# node aggregation options
aggregation_default = sum
aggregation_mutate_rate = 0.0
aggregation_options = sum
# node bias options
bias_init_mean = 0.0
bias_init_stdev = 1.0
bias_max_value = 30.0
bias_min_value = -30.0
bias_mutate_power = 0.5
bias_mutate_rate = 0.7
bias_replace_rate = 0.1
# genome compatibility options
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient = 0.5
# connection add/remove rates
conn_add_prob = 0.5
conn_delete_prob = 0.5
# connection enable options
enabled_default = True
enabled_mutate_rate = 0.01
feed_forward = True
initial_connection = full_direct
# node add/remove rates
node_add_prob = 0.2
node_delete_prob = 0.2
# network parameters
num_hidden = 0
num_inputs = 3
num_outputs = 1
# node response options
response_init_mean = 1.0
response_init_stdev = 0.0
response_max_value = 30.0
response_min_value = -30.0
response_mutate_power = 0.0
response_mutate_rate = 0.0
response_replace_rate = 0.0
# connection weight options
weight_init_mean = 0.0
weight_init_stdev = 1.0
weight_max_value = 30
weight_min_value = -30
weight_mutate_power = 0.5
weight_mutate_rate = 0.8
weight_replace_rate = 0.1
[DefaultSpeciesSet]
compatibility_threshold = 3.0
[DefaultStagnation]
species_fitness_func = max
max_stagnation = 20
species_elitism = 2
[DefaultReproduction]
elitism = 2
survival_threshold = 0.2
However, the issue here is that no retraining takes place after each prediction is made in the test set. I believe the parameters in the config file are static and cannot change after the training process beginnings, so if you fitness level is based on the number of correct classifications of the training set (which is what I'm trying to implement, very similar to the one used here) this will be a problem and so I would like to understand whether a model that retrains can be implemented through adjusting a setting in the config file. Or is there more to it then that?
If I understand what you're asking correctly, this can't simply be done inside the config_file.
The parameters defined within the config_file are simply changing what happens when the model runs straight through the data, making predictions without any retraining.
If you want the model to retrain after every prediction you'd have to implement this functionality within the eval_genomes
and/or run
functions. You could add another for loop within the one iterating through each genome to take each output and retrain the model. However, this would probably increase the compute time significantly since you're not simply getting outputs, but running another set of training generations for each output.