pythonpybrain

How to predict on new data using Pybrain?


What I want to do is ask Pybrain to predict on new data, for example predict(0,1,0,1,1,0) and it should output what the answer it thinks it would be.

The question is, what code do I need to paste to make this happen?

Additional info: the weather.csv file that Pybrain is learning on has 6 attributes and the answer can only be 1 or 0. No other number.

Again all I want to do is ask pyBrain after it has learned to predict on numbers I give it. like this for example predict(0,1,0,1,1,0) and it should out an answer. I am very new to Python and Pybrain.

This is my code so far:

from pybrain.datasets import SupervisedDataSet
from pybrain.tools.shortcuts import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer

from pybrain.datasets            import ClassificationDataSet
from pybrain.utilities           import percentError
from pybrain.tools.shortcuts     import buildNetwork
from pybrain.supervised.trainers import BackpropTrainer
from pybrain.structure.modules   import SoftmaxLayer

from pylab import ion, ioff, figure, draw, contourf, clf, show, hold, plot
from scipy import diag, arange, meshgrid, where
from numpy.random import multivariate_normal

ds = SupervisedDataSet(6,1)

tf = open('weather.csv','r')

for line in tf.readlines():
    try:
        data = [float(x) for x in line.strip().split(',') if x != '']
        indata =  tuple(data[:6])
        outdata = tuple(data[6:])
        ds.addSample(indata,outdata)
    except ValueError,e:
            print "error",e,"on line"


n = buildNetwork(ds.indim,8,8,ds.outdim,recurrent=True)
t = BackpropTrainer(n,learningrate=0.001,momentum=0.05,verbose=True)
t.trainOnDataset(ds,3000)
t.testOnData(verbose=True)

Update:

My weather.csv file has a total of only 7 observations (just for testing purposes for now). It looks like this inside the csv file (the data was extracted from one week in 1970):

1   0   1   1   1   1   1
0   0   0   1   1   1   0
1   0   1   1   1   1   1
0   0   0   1   1   1   0
0   0   0   1   1   1   0
0   0   0   1   1   1   0
0   0   0   1   1   1   0

The last column (far right) is the one Pybrain needs predicts. When I run the code and tell Pybrain to train on this little data set 3000 times (I want to overfit). The output I get is

Total error: 0.0140074590407
Total error: 0.0139930126505
Total error: 0.0139796724323
Total error: 0.0139656881439

Testing on data:
out:     [  0.732]
correct: [  1.000]
error:  0.03581333
out:     [  0.101]
correct: [  0.000]
error:  0.00511758
out:     [  0.732]
correct: [  1.000]
error:  0.03581333
out:     [  0.101]
correct: [  0.000]
error:  0.00511758
out:     [  0.101]
correct: [  0.000]
error:  0.00511758
out:     [  0.101]
correct: [  0.000]
error:  0.00511758
out:     [  0.101]
correct: [  0.000]
error:  0.00511758

Now I just want to tell pybrain with the over fitted model that it has trained to predict on new data in 2014. But I don't know how. My goal is to see how well the over fitted model does on new data in 2014.


Solution

  • If I understand your question correctly, you want to use the activate function. For example, if you add these two lines to the end of your code above:

    data2014 = n.activate([0,1,0,1,0,1])
    print 'data2014',data2014
    

    ...it will print out the output for a single row. Of course, you probably want to predict for more than a single row, so you will want to read in a second csv, use the activate function in a loop, etc. But this should give you the basic idea.