I switched from NumPy arrays to Pandas DataFrames (dfs) many years ago because the latter has column names, which
.json
or .csv
file.From time to time, I need the last row ([-1]
) of some column col
of some df1
, and combine it with the last row of the same column col
of another df2
. I know the name of the column, not their position/order (I could know, but it might change, and I want to have a code that is robust against changers in the order of columns).
So what I have been doing for years in a number of Python scripts is something that looks like
import numpy as np
import pandas as pd
# In reality, these are read from json files - the order
# of the columns may change, their names may not:
df1 = pd.DataFrame(np.random.random((2,3)), columns=['col2','col3','col1'])
df2 = pd.DataFrame(np.random.random((4,3)), columns=['col1','col3','col2'])
df1.col2.iloc[-1] = df2.col2.iloc[-1]
but since some time my mailbox gets flooded with cron jobs going wrong, telling me that
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy. A typical example is when you are setting values in a column of a DataFrame, like:
df["col"][row_indexer] = value
Use
df.loc[row_indexer, "col"] = values
instead, to perform the assignment in a single step and ensure this keeps updating the originaldf
.See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df1.col2.iloc[-1] = df2.col2.iloc[-1]
Of course, this error message is incorrect, and replacing the last line in my example with either of
df1.loc[-1, 'col2'] = df2.loc[-1, 'col2'] # KeyError: -1
df1.iloc[-1, 'col2'] = df2.iloc[-1, 'col2'] # ValueError (can't handle 'col2')
does not work either, since .iloc[]
cannot handle column names and .loc[]
cannot handle relative numbers.
How can I handle the last (or any other relative number) row and a column with given name of a Pandas DataFrame?
You can try to use the following snippet.
df1.loc[df1.index[-1], 'col1'] = df2.loc[df2.index[-1], 'col1']
On my machine with pandas version 2.2.3, it gives no warnings.