I am working with numpy and pandas on Python to learn how to work on dataframes.
I'm coding on Collaboratory and I have loaded the Iris dataset but for some reason, there is no "Species" column in my dataframe. Maybe I've loaded it in an incorrect fashion? I'd appreciate help on the matter.
I added an image, if the code is still needed then this is what I have:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
df = pd.DataFrame(load_iris().data, columns=load_iris().feature_names)
Try:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target']).astype({'target': int}) \
.assign(species=lambda x: x['target'].map(dict(enumerate(iris['target_names']))))
Output:
>>> df
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target species
0 5.1 3.5 1.4 0.2 0 setosa
1 4.9 3.0 1.4 0.2 0 setosa
2 4.7 3.2 1.3 0.2 0 setosa
3 4.6 3.1 1.5 0.2 0 setosa
4 5.0 3.6 1.4 0.2 0 setosa
.. ... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 2 virginica
146 6.3 2.5 5.0 1.9 2 virginica
147 6.5 3.0 5.2 2.0 2 virginica
148 6.2 3.4 5.4 2.3 2 virginica
149 5.9 3.0 5.1 1.8 2 virginica
[150 rows x 6 columns]
How to create the species
column from target
and target_names
columns?
>>> iris['target_names']
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
# index 0: setosa
# index 1: versicolor
# index 2: virginica
>>> iris['target']
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
You just need a dict mapping to replace 0 by 'setosa', 1 by 'versicolor' and 2 by 'virginica'. Use enumerate
to create a list of tuples [(0, 'setosa'), (1, 'versicolor), (2, 'virginica')] then
dict` to convert as a dictionary:
>>> dict(enumerate(iris['target_names']))
{0: 'setosa', 1: 'versicolor', 2: 'virginica'}
Now Series.map
will map the corresponding values:
>>> df['target'].map(dict(enumerate(iris['target_names'])))
0 setosa
1 setosa
2 setosa
3 setosa
4 setosa
...
145 virginica
146 virginica
147 virginica
148 virginica
149 virginica
Name: target, Length: 150, dtype: object