pythonpandasmethod-chaining

copy a dataframe to new variable with method chaining


Is it possible to copy a dataframe in the middle of a method chain to a new variable? Something like:

import pandas as pd

df = (pd.DataFrame([[2, 4, 6],
                    [8, 10, 12],
                    [14, 16, 18],
                    ])
      .assign(something_else=100)
      .div(2)
      .copy_to_new_variable(df_imag)  # Imaginated method to copy df to df_imag.
      .div(10)
      )

print(df_imag) would then return:

    0   1   2   something_else
0   1.0 2.0 3.0 50.0
1   4.0 5.0 6.0 50.0
2   7.0 8.0 9.0 50.0

.copy_to_new_variable(df_imag) could be replaced by df_imag = df.copy() but this would result in compromising the method chain.


Solution

  • Actually, this is what I was looking for. Check the link, the idea is from Matt Harrison (who wrote multiple books about pandas) for debugging of method chains. This way is also recommended in this great article 4 Pandas Anti-Patterns to Avoid and How to Fix Them by Aidan Cooper.

    import pandas as pd
    
    def to_df(df, name):
        globals()[name] = df.copy()
        return df
    
    df = (pd.DataFrame([[1, 2, 3],
                        [10, 10, 10],
                        ], columns=["A", "B", "C"]
                       )
          .set_index("C")
          .pipe(to_df, "df_imag")
          .sum()
          )
    

    df_imag is then the intermediate dataframe as described in the question.

    In jupyter notebooks, if you would like to view the dataframe midway through the chain without interrupting the rest of the chain, you can use .pipe(lambda df_: display(df_) or df_), also explained in the mentioned article:

    import pandas as pd
    
    df = (
        pd.DataFrame(
            [
                [2, 4, 6],
                [8, 10, 12],
                [14, 16, 18],
            ]
        )
        .assign(something_else=100)
        .div(2)
        .pipe(lambda df_: display(df_) or df_)
        .div(10)
    )