pythonpandas

"Cannot set a DataFrame with multiple columns to the single column" when script is in function


I have a function which processes dataframe of 6 columns. It looks like this:

def Process_DF():

   DF_6cols = "some data"

    #Two functions to split column containing Column Val1 and Column Val2
    def split_column_1(string):
        str_len = len(string)
        range_obj = range(str_len)
        character = ","
        for index in range_obj:
            if string[index] == character:
                position = index
        return string[0:position]


    def split_column_2(string):
        str_len = len(string)
        range_obj = range(str_len)
        character = ","
        for index in range_obj:
            if string[index] == character:
                position = index
        return string[position+1:]


    #Splitting column 6 into two proper columns and dropping it
    DF_7cols["Column Val1"] = DF_6cols.apply(lambda x: split_column_1(x["Splittable Column"]), axis=1)
    DF_8cols["Column Val2"] = DF_7cols.apply(lambda x: split_column_2(x["Splittable Column"]), axis=1)
    DF_ToReturn = PTT_DataFrame_6Columns.drop("Splittable Colum", axis=1)

    return DF_ToReturn

When trying to use from subscript import Process_DF and calling it from main script, I'm getting following error:

  File "subscript.py", line x, in Process_DF
    DF_7cols["Column Val1"] = DF_6cols.apply(lambda x: split_column_1(x["Splittable Column"]), axis=1)
  File "~\Python\Python310\site-packages\pandas\core\frame.py", line 4301, in __setitem__
    self._set_item_frame_value(key, value)
  File "~\Python\Python310\site-packages\pandas\core\frame.py", line 4459, in _set_item_frame_value
    raise ValueError(
ValueError: Cannot set a DataFrame with multiple columns to the single column Column Val1

On the other hand, when trying to take it out of function and run same script - no error is happening and it is running ok. I also tried to take functions split_column_1 and split_column_2 out as well and declare them in main script or declare them in subscript file and import. Not sure why pd is trying to put multiple columns in one in case of function, when it clearly doesn't outside of it. Python used - 3.10, Pandas - 2.2.3

I checked other questions about similar problem, but they had solutions which doesn't solve my case.

For reprex you can use dataframe:

Index   Splittable Column
1           ,FALSE
2      Not Acceptable, FALSE
3      Not Acceptable, FALSE
4           ,FALSE
5           ,FALSE
6           ,FALSE

Is there a fix for this or way around it?

UPD

I noticed that many people suggest to use str.splt as a solution. Unfortunately in my reprex I failed to show that there could be multiple instances of "," being present, and I need only last part of it to be in column 7 and first part to be in column 6. Better reprex of data I have would be this:

data = {'Index': ['1', '2', '3','4'], 'Splittable column': [',FALSE', 'Not Acceptable,Failed,NAN,FALSE', 'Not acceptable,FALSE',',FALSE']}
DF_6cols = pd.DataFrame(data)

I need last column to be certain status and data in previous column with explanation for what happened. There could be list of reasons or error response from SQL server. Both cases can contain multiple instances with "," in it, that's why I had to resort to such encoding. If there is better way to do this - I would be happy to apply it


Solution

  • I don't know if I understand what you try to do but I think you don't nees all this code.

    To split on last , you can use .str.rsplit(",", n=1) (right-split on first , from the end)

    And you can use .str[0] to get first part, and .str[1] to get second part

    df['Column Val1'] = df['Splittable Column'].str.rsplit(',', n=1).str[0]
    df['Column Val2'] = df['Splittable Column'].str.rsplit(',', n=1).str[1]
    

    or

    splitted = df['Splittable Column'].str.rsplit(',', n=1)
    
    df['Column Val1'] = splitted.str[0]
    df['Column Val2'] = splitted.str[1]
    

    You can also use .str.rsplit( ..., n=1, expand=True) to create two columns at once

    df[['Column Val1','Column Val2']] = df['Splittable Column'].str.rsplit(',', n=1, expand=True)
    

    EDIT:

    Because there is space after , so it creates strings with space in " FALSE" .

    In first method you can add .str.strip() to remove it

    splitted = df['Splittable Column'].str.rsplit(',', n=1)
    
    df['Column Val1'] = splitted.str[0].str.strip()
    df['Column Val2'] = splitted.str[1].str.strip()
    

    In second method it needs .apply(lambda x: x.str.strip())

    df[['Column Val1','Column Val2']] = df['Splittable Column'].str.rsplit(',', n=1, expand=True).apply(lambda x: x.str.strip())
    

    Full working example used for tests:

    I used io.StringIO only to create file-like object in memmory - so everyone can test it without creating file .csv

    import pandas as pd
    
    
    def ver1(df):
        print('--- ver 1 ---')
    
        df['Column Val1'] = df['Splittable column'].str.rsplit(',', n=1).str[0].str.strip()
        df['Column Val2'] = df['Splittable column'].str.rsplit(',', n=1).str[1].str.strip()
    
        return df
    
    def ver2(df):
        print('--- ver 2 ---')
    
        splitted = df['Splittable column'].str.rsplit(',', n=1)
        #print(splitted)
        df['Column Val1'] = splitted.str[0].str.strip()
        df['Column Val2'] = splitted.str[1].str.strip()
    
        return df
    
    def ver3(df):
        print('--- ver 3 ---')
    
        df[['Column Val1','Column Val2']] = df['Splittable column'].str.rsplit(',', n=1, expand=True).apply(lambda x: x.str.strip())
    
        return df
    
    # --- main ---
    
    import pandas as pd
    
    data = {
        'Index': ['1', '2', '3','4'], 
        'Splittable column': [',FALSE', 'Not Acceptable,Failed,NAN,FALSE', 'Not acceptable,FALSE',',FALSE']
    }
    
    df = DF_6cols = pd.DataFrame(data)
    
    print(df)
    
    df = ver1(df)
    #df = ver2(df)
    #df = ver3(df)
    
    df.drop(columns='Splittable column', inplace=True)
    
    print(df)
    
    #print('Val2:', df['Column Val2'].to_list())  # check if there are spaces
    

    Result:

     Index                Splittable column
    0     1                           ,FALSE
    1     2  Not Acceptable,Failed,NAN,FALSE
    2     3             Not acceptable,FALSE
    3     4                           ,FALSE
    --- ver 1 ---
      Index                Column Val1 Column Val2
    0     1                                  FALSE
    1     2  Not Acceptable,Failed,NAN       FALSE
    2     3             Not acceptable       FALSE
    3     4                                  FALSE
    

    Doc: pandas.Series.str.split, pandas.Series.str.rsplit