Consider the following multi-level column index dataframe:
import numpy as np
import pandas as pd
arrays = [
["A", "A", "B", "B"],
["one", "two", "one", "two"],
["1", "2", "1", "pd.NA"],
]
idx = pd.MultiIndex.from_arrays(arrays, names=["level_0", "level_1", "level_2"])
data = np.random.randn(3, 4)
df = pd.DataFrame(data, columns=idx)
print(df)
level_0 A B
level_1 one two one two
level_2 1 2 1 pd.NA
0 -1.249285 0.314225 0.011139 0.675274
1 -0.654808 -0.492350 0.596338 -0.087334
2 0.113570 0.566687 -0.361334 0.085368
level_2
holds values of type object
(str
really).
df.columns.get_level_values(2)
Index(['1', '2', '1', 'pd.NA'], dtype='object', name='level_2')
I need to parse it to the correct data type and change this particular column level.
new_level_2 = [
pd.NA if x == "pd.NA" else int(x) for x in df.columns.get_level_values(2)
]
I am looking for a pythonic way to replace the old level_2
with new_level_2
.
You could convert the MultiIndex.to_frame
then back to MultiIndex.from_frame
, change the values with replace
and as_type
:
df.columns = pd.MultiIndex.from_frame(df.columns.to_frame()
.replace({'level_2': {'pd.NA': pd.NA}})
.astype({'level_2': 'Int64'}))
Output:
level_0 A B
level_1 one two one two
level_2 1 2 3 <NA>
0 0.144044 1.454274 0.761038 0.121675
1 0.443863 0.333674 1.494079 -0.205158
2 0.313068 -0.854096 -2.552990 0.653619