pythonpandasdataframelist-comprehensionhumanize

Use Humanize.intword function in every row and numeric column of a dataframe


I have a dataset with very big numbers. I would like to facilitate reading by using the humanize.intword function in all columns except the date.

enter image description here

When I select only one column, it works:

pred_df["Predictions"].apply(lambda x: humanize.intword(x))

When I try to select other numeric columns, I get an error:

pred_df.apply(lambda row : humanize.intword(row['Predictions'],row['Lower'], row['Upper']), axis = 1)

TypeError: sequence item 0: expected str instance, float found

I also tried list comprehensions as suggested in this post https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas but I am probably doing something wrong. It works for one single column:

[humanize.intword(x) for x in pred_df["Predictions"]]

When I try over different columns I get an error:

[humanize.intword(row1, row[11]) for row in zip(pred_df["Predictions"],pred_df["Lower"])]

IndexError: tuple index out of range

My dataframe contains 12 rows and 4 columns. Can you help me to understand what is the problem?


Solution

  • Problem is humanize.intword works with a single value and converts it. But aim here is to convert many numbers. One way is to applymap:

    df.set_index("fiscal_date").applymap(humanize.intword)
    

    where we first set the date as the index to not use it in the calculations. You can put it back to a column with a reset_index() afterwards if you wish.


    As to why you get errors or not:

    When I select only one column, it works:

    Because you select a series and what are passed to apply are single entries of that column; and it works.

    When I try to select other numeric columns, I get an error:

    Because you'd be supplying 3 values to intword but it can only work with 1 + 1, where first is the value to convert and other is the optional format. (The error message should've been something like "this function takes 1 to 2 arguments but you gave 3", I believe.)

    It works for one single column:

    Again, this is akin to first apply over one column.

    When I try over different columns I get an error:

    Again, intword can work with one value at a time. (But the error is because you gave 11 as an index to row which has 2 elements only coming from those 2 columns' entries.)