I have a function which processes dataframe of 6 columns. It looks like this:
def Process_DF():
DF_6cols = "some data"
#Two functions to split column containing Column Val1 and Column Val2
def split_column_1(string):
str_len = len(string)
range_obj = range(str_len)
character = ","
for index in range_obj:
if string[index] == character:
position = index
return string[0:position]
def split_column_2(string):
str_len = len(string)
range_obj = range(str_len)
character = ","
for index in range_obj:
if string[index] == character:
position = index
return string[position+1:]
#Splitting column 6 into two proper columns and dropping it
DF_7cols["Column Val1"] = DF_6cols.apply(lambda x: split_column_1(x["Splittable Column"]), axis=1)
DF_8cols["Column Val2"] = DF_7cols.apply(lambda x: split_column_2(x["Splittable Column"]), axis=1)
DF_ToReturn = PTT_DataFrame_6Columns.drop("Splittable Colum", axis=1)
return DF_ToReturn
When trying to use from subscript import Process_DF
and calling it from main script, I'm getting following error:
File "subscript.py", line x, in Process_DF
DF_7cols["Column Val1"] = DF_6cols.apply(lambda x: split_column_1(x["Splittable Column"]), axis=1)
File "~\Python\Python310\site-packages\pandas\core\frame.py", line 4301, in __setitem__
self._set_item_frame_value(key, value)
File "~\Python\Python310\site-packages\pandas\core\frame.py", line 4459, in _set_item_frame_value
raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column Column Val1
On the other hand, when trying to take it out of function and run same script - no error is happening and it is running ok. I also tried to take functions split_column_1
and split_column_2
out as well and declare them in main script or declare them in subscript file and import. Not sure why pd is trying to put multiple columns in one in case of function, when it clearly doesn't outside of it. Python used - 3.10, Pandas - 2.2.3
I checked other questions about similar problem, but they had solutions which doesn't solve my case.
For reprex you can use dataframe:
Index Splittable Column
1 ,FALSE
2 Not Acceptable, FALSE
3 Not Acceptable, FALSE
4 ,FALSE
5 ,FALSE
6 ,FALSE
Is there a fix for this or way around it?
UPD
I noticed that many people suggest to use str.splt
as a solution. Unfortunately in my reprex I failed to show that there could be multiple instances of "," being present, and I need only last part of it to be in column 7 and first part to be in column 6.
Better reprex of data I have would be this:
data = {'Index': ['1', '2', '3','4'], 'Splittable column': [',FALSE', 'Not Acceptable,Failed,NAN,FALSE', 'Not acceptable,FALSE',',FALSE']}
DF_6cols = pd.DataFrame(data)
I need last column to be certain status and data in previous column with explanation for what happened. There could be list of reasons or error response from SQL server. Both cases can contain multiple instances with "," in it, that's why I had to resort to such encoding. If there is better way to do this - I would be happy to apply it
I don't know if I understand what you try to do but I think you don't nees all this code.
To split on last ,
you can use .str.rsplit(",", n=1)
(right-split on first ,
from the end)
And you can use .str[0]
to get first part, and .str[1]
to get second part
df['Column Val1'] = df['Splittable Column'].str.rsplit(',', n=1).str[0]
df['Column Val2'] = df['Splittable Column'].str.rsplit(',', n=1).str[1]
or
splitted = df['Splittable Column'].str.rsplit(',', n=1)
df['Column Val1'] = splitted.str[0]
df['Column Val2'] = splitted.str[1]
You can also use .str.rsplit( ..., n=1, expand=True)
to create two columns at once
df[['Column Val1','Column Val2']] = df['Splittable Column'].str.rsplit(',', n=1, expand=True)
EDIT:
Because there is space after ,
so it creates strings with space in " FALSE"
.
In first method you can add .str.strip()
to remove it
splitted = df['Splittable Column'].str.rsplit(',', n=1)
df['Column Val1'] = splitted.str[0].str.strip()
df['Column Val2'] = splitted.str[1].str.strip()
In second method it needs .apply(lambda x: x.str.strip())
df[['Column Val1','Column Val2']] = df['Splittable Column'].str.rsplit(',', n=1, expand=True).apply(lambda x: x.str.strip())
Full working example used for tests:
I used io.StringIO
only to create file-like object in memmory - so everyone can test it without creating file .csv
import pandas as pd
def ver1(df):
print('--- ver 1 ---')
df['Column Val1'] = df['Splittable column'].str.rsplit(',', n=1).str[0].str.strip()
df['Column Val2'] = df['Splittable column'].str.rsplit(',', n=1).str[1].str.strip()
return df
def ver2(df):
print('--- ver 2 ---')
splitted = df['Splittable column'].str.rsplit(',', n=1)
#print(splitted)
df['Column Val1'] = splitted.str[0].str.strip()
df['Column Val2'] = splitted.str[1].str.strip()
return df
def ver3(df):
print('--- ver 3 ---')
df[['Column Val1','Column Val2']] = df['Splittable column'].str.rsplit(',', n=1, expand=True).apply(lambda x: x.str.strip())
return df
# --- main ---
import pandas as pd
data = {
'Index': ['1', '2', '3','4'],
'Splittable column': [',FALSE', 'Not Acceptable,Failed,NAN,FALSE', 'Not acceptable,FALSE',',FALSE']
}
df = DF_6cols = pd.DataFrame(data)
print(df)
df = ver1(df)
#df = ver2(df)
#df = ver3(df)
df.drop(columns='Splittable column', inplace=True)
print(df)
#print('Val2:', df['Column Val2'].to_list()) # check if there are spaces
Result:
Index Splittable column
0 1 ,FALSE
1 2 Not Acceptable,Failed,NAN,FALSE
2 3 Not acceptable,FALSE
3 4 ,FALSE
--- ver 1 ---
Index Column Val1 Column Val2
0 1 FALSE
1 2 Not Acceptable,Failed,NAN FALSE
2 3 Not acceptable FALSE
3 4 FALSE