pythonpandasmethod-chaining

Breaking long method chains into multiple lines in Python


I'm learning Python and pandas and I very often end up with long chains of method calls. I know how to break lists and chains of operators in a way that compiles, but I can't find a way to break method chains in a way that doesn't feel like cheating.

There's plenty of examples of breaking up operator chains and lists in the googles, but I can't find anything decent for method chains.

How can I break a long chain of method calls into multiple lines?

Say a line like this one:

t_values = df_grouped_by_day.sort_values('day_of_week').groupby(['day_of_week', 'day_of_week_name'])['Show_up'].apply(lambda sample: ttest_ind(population, sample)).reset_index()

Solution

  • Black -- a Python code-style library

    Black wraps lines for call chaining as:

    t_values = (
        df_grouped_by_day.sort_values('day_of_week')
        .groupby(
            [
                'day_of_week',
                'day_of_week_name',
                "foo",
                "bar",
                "buzz",
                "foobar",
                "foobarbuz",
            ]
        )['Show_up']
        .apply(
            lambda sample: ttest_ind(
                population,
                sample,
                foo,
                bar,
                buzz,
                foobar,
                foobarbuz,
            ),
        )
        .reset_index()
    )
    

    I added a few more arguments to stretch the above example but reduced them to make my point in the below one.

    Wrapping at all brackets but without outer parentheses

    Personally, I used to prefer more like the following, but that can get weird, to me, when making some calls without arguments, as well as mixing square-brace accessor syntax, like the above example.

    t_values = df_grouped_by_day.sort_values(
        'day_of_week',
    ).groupby(
        [
            'day_of_week',
            'day_of_week_name',
        ]
    )[
        'Show_up',
    ].apply(
        lambda sample: ttest_ind(population, sample),
    ).reset_index(
    )
    

    [Edit] I, oddly, find it both subtle and jarring when mixing square-brace accessors with parentheses of execution calling. However, I returned them to the second example.

    Separate statements with intermediate variables

    This does not answer the question itself, but offers an alternative.

    Uniformity in the logic applied is usually enough to dictate separating chained calls into multiple statements with intermediary variables. ...such and such part is the query with or followed by sorting and grouping, then separate statements to apply a manipulation to those results, and so on.

    I do not know the actual reasoning behind the original statement, but here is a possible example of separate statements:

    df = df_grouped_by_day  # alias.
    pop = population  # alias.
    groupings = [
        'day_of_week',
        'day_of_week_name',
    ]
    
    t_sorted = df.sort_values('day_of_week').groupby(groupings)
    t_show_up = t_sorted['Show_up']
    t_values = t_show_up.apply(lambda sample: ttest_ind(pop, sample)).reset_index()
    

    I do frequently use the heavily "vertical" code-style but primarily when it simplifies commenting on the logic of what is going on. Sometimes it starts as several chained calls but gets refactored to intermediary variables or separate functions to reinforce to the reader the purpose of each step taken.