I have a dataset with very big numbers. I would like to facilitate reading by using the humanize.intword
function in all columns except the date.
When I select only one column, it works:
pred_df["Predictions"].apply(lambda x: humanize.intword(x))
When I try to select other numeric columns, I get an error:
pred_df.apply(lambda row : humanize.intword(row['Predictions'],row['Lower'], row['Upper']), axis = 1)
TypeError: sequence item 0: expected str instance, float found
I also tried list comprehensions as suggested in this post https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
but I am probably doing something wrong. It works for one single column:
[humanize.intword(x) for x in pred_df["Predictions"]]
When I try over different columns I get an error:
[humanize.intword(row1, row[11]) for row in zip(pred_df["Predictions"],pred_df["Lower"])]
IndexError: tuple index out of range
My dataframe contains 12 rows and 4 columns. Can you help me to understand what is the problem?
Problem is humanize.intword
works with a single value and converts it. But aim here is to convert many numbers. One way is to applymap
:
df.set_index("fiscal_date").applymap(humanize.intword)
where we first set the date as the index to not use it in the calculations. You can put it back to a column with a reset_index()
afterwards if you wish.
As to why you get errors or not:
When I select only one column, it works:
Because you select a series and what are passed to apply
are single entries of that column; and it works.
When I try to select other numeric columns, I get an error:
Because you'd be supplying 3 values to intword
but it can only work with 1 + 1, where first is the value to convert and other is the optional format. (The error message should've been something like "this function takes 1 to 2 arguments but you gave 3", I believe.)
It works for one single column:
Again, this is akin to first apply
over one column.
When I try over different columns I get an error:
Again, intword
can work with one value at a time. (But the error is because you gave 11
as an index to row
which has 2 elements only coming from those 2 columns' entries.)