How do I build a dataset with strings in pybrain.datasets.addSample()? I'm getting an error which says "cannot convert string to float: gas".
Am I missing something, like an index value or a defined link between the input and target? I'm not sure how to read the documentation on this. Thanks for your help.
import pybrain
from pybrain.datasets import ClassificationDataSet
#set up input and target variables
ds = ClassificationDataSet(inp=2, target=1)
#add data to dataset
ds.addSample(('gas', 'blue'), ('car',))
ds.addSample(('desiel', 'brown'), ('truck',))
# error
ValueError: could not convert string to float: gas
It looks like pybrain only uses float types. Because of this, you might want to create a unique float value for each unique string variable. Maybe apply the ord() function to each character in the string, for each string in the tuple. Best practice is to use a list comprehension statement rather than map() and lambda functions.
>>> ord('a')
97
>>> ord('\u00c2')
192
or like
>>> [ord(c) for c in 'Hello World!']
[72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]
so maybe like this:
>>>x = [('gas', 'blue'),]
>>>for var in x:
>>> # for each letter of word
>>> for c in var:
>>> # list of ord() values for each letter of word
>>> letter = [ord(i) for i in c]
>>> # convert list to string
>>> number = [str(i) for i in letter]
>>> # join() to combine list into a single string
>>> word = ''.join(number)
>>> print c, word
gas 10397115
blue 98108117101
Representing strings as float type along with using Natural Language Tool Kit to represent occurrences of words might help in preparing your data for training a neural network model on.
Python3 convert Unicode String to int representation
https://stackoverflow.com/questions/36680250/pybrain-neural-network-nominal-string-inputs
https://datascience.stackexchange.com/questions/869/neural-network-parse-string-data