pythonnumpy

How to replace specific entries of a Numpy array based on its content


So let's say I have a simple matrix made out of ndarrays (just an example of how part of the data might look like):

import numpy as np
a = np.asarray([['1.0', 'Miami'],
   ['2.0', 'Boston'],
   ['1.4', 'Miami']]) 

I want to do data analysis in this complex data set ;) - I want to transform 'Miami' in 0 and Boston in 1 in order to use a really fancy ML algorithm.

What is a good way to accomplish this in Python?
(I am not asking for the obvious one of iterating and using a dictionary / if sentence to replace the entry) but more if there's a better way using NumPy or native Python to do this.


Solution

  • pandas is a good tool for this.
    First convert the array to a DataFrame:

    In [11]: import pandas as pd
    
    In [12]: df = pd.DataFrame(a, columns=['value', 'city'])
    

    and then replace entries from the city column:

    In [13]: df.city = df.city.replace({'Miami': 0, 'Boston': 1})
    
    In [14]: df
    Out[14]:
      value city
    0   1.0    0
    1   2.0    1
    2   1.4    0