I have a Pandas dataframe that looks like this:
1 | 2
-------|-------
123 | 123
456 | 456
NaN | 789
The data comes from a spreadsheet which is sent to me quarterly in the same format. I need to convert the dataframe into this format:
list_number | item_code
------------|----------
1 | 123
1 | 456
2 | 123
2 | 456
2 | 789
I have some code that does this, which has worked fine for a couple of years. However, this quarter I get the following error:
ValueError: Unable to parse string "nan"
Here is the bit of code which causes the problem:
df = #my data
df.dropna(axis=1, how='all', inplace=True)
df = pd.melt(df, var_name='LIST_NUMBER', value_name='ITEM_CODE')
df["LIST_NUMBER"] = pd.to_numeric(
df["LIST_NUMBER"].astype(str).str.replace(r"[^\d]", ""),
errors='raise')
df["ITEM_CODE"] = pd.to_numeric(
df["ITEM_CODE"].astype(str).str.replace(r"[^\d]", ""),
errors='raise')
The dropna function is failing to replace the NaN value; the dataframe looks the same after as it did before. This then causes the to_numeric function to fail, resulting in the error message.
Strangely, the code still works when my colleague runs it on her laptop, which suggests to me that the problem may be my modules rather than the code or data. I recently upgraded Pandas to 2.1.4 (I don't remember the previous version, but it was earlier than 2.0). She's using Pandas 1.4.4.
Does anyone know why this code is failing, please?
EDIT: For clarity, this is what the dataframe looks like after pd.melt. Melt is not the problem; dropna is.
list_number | item_code
------------|----------
1 | 123
1 | 456
1 | NaN
2 | 123
2 | 456
2 | 789
The answer was provided by wjandrea in the comments: the regex
parameter in .str.replace() was not specified in my code. In Pandas 2.0.0 the default for this parameter was changed from True to False, causing the code to fail. Specifying regex=True
fixed this.