So let's say I have a simple matrix made out of ndarrays (just an example of how part of the data might look like):
import numpy as np
a = np.asarray([['1.0', 'Miami'],
['2.0', 'Boston'],
['1.4', 'Miami']])
I want to do data analysis in this complex data set ;) - I want to transform 'Miami' in 0 and Boston in 1 in order to use a really fancy ML algorithm.
What is a good way to accomplish this in Python?
(I am not asking for the obvious one of iterating and using a dictionary / if sentence to replace the entry) but more if there's a better way using NumPy or native Python to do this.
pandas is a good tool for this.
First convert the array to a DataFrame:
In [11]: import pandas as pd
In [12]: df = pd.DataFrame(a, columns=['value', 'city'])
and then replace entries from the city column:
In [13]: df.city = df.city.replace({'Miami': 0, 'Boston': 1})
In [14]: df
Out[14]:
value city
0 1.0 0
1 2.0 1
2 1.4 0