Is it possible to copy a dataframe in the middle of a method chain to a new variable? Something like:
import pandas as pd
df = (pd.DataFrame([[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
])
.assign(something_else=100)
.div(2)
.copy_to_new_variable(df_imag) # Imaginated method to copy df to df_imag.
.div(10)
)
print(df_imag)
would then return:
0 1 2 something_else
0 1.0 2.0 3.0 50.0
1 4.0 5.0 6.0 50.0
2 7.0 8.0 9.0 50.0
.copy_to_new_variable(df_imag)
could be replaced by df_imag = df.copy()
but this would result in compromising the method chain.
Actually, this is what I was looking for. Check the link, the idea is from Matt Harrison (who wrote multiple books about pandas) for debugging of method chains. This way is also recommended in this great article 4 Pandas Anti-Patterns to Avoid and How to Fix Them by Aidan Cooper.
import pandas as pd
def to_df(df, name):
globals()[name] = df.copy()
return df
df = (pd.DataFrame([[1, 2, 3],
[10, 10, 10],
], columns=["A", "B", "C"]
)
.set_index("C")
.pipe(to_df, "df_imag")
.sum()
)
df_imag
is then the intermediate dataframe as described in the question.
In jupyter notebooks, if you would like to view the dataframe midway through the chain without interrupting the rest of the chain, you can use .pipe(lambda df_: display(df_) or df_)
, also explained in the mentioned article:
import pandas as pd
df = (
pd.DataFrame(
[
[2, 4, 6],
[8, 10, 12],
[14, 16, 18],
]
)
.assign(something_else=100)
.div(2)
.pipe(lambda df_: display(df_) or df_)
.div(10)
)