python pandas split conditional-statements

Why is my column split not working correctly?

Why is my python3 code to split column 4 at the bases of "," to the column 7, 8, 9 with condition if the column 5 value is "LUB" not working?

source data: source data screenshot

20240801    CASH MAN1   120.00  Z20/1   0.5 , Z20/1 ; 120   1   LUB
20240801    PAYTM   15720.00    CASH MAN1   0   1   JRNLCR
20240801    PAYTM   81343.00    CASH MAN2   0   2   JRNLCR

my code:

import numpy as np
import pandas as pd

var001_sf = input("Enter file name to import : ")
var002_sf = var001_sf + ".csv"
var002_ar1 = pd.read_csv(var002_sf, sep='\t', header=None)
var002_ar1 = var002_ar1.dropna(axis=0)  # remove rows with na value
var002_ar1.iloc[:, [0, 1, 2, 3, 4, 5, 6]] = var002_ar1.iloc[:, [0, 1, 2, 3, 4, 5, 6]].map(str)
var002_ar1.loc[var002_ar1.iloc[:, 6] == 'LUB', [7, 8, 9]] = var002_ar1.iloc[:, 4].str.split(', ', n=2, expand=True)
var002_ar1.to_csv("t250221.csv", sep='\t')

output data: output data screenshot

20240801    CASH MAN1   120.0   Z20/1   0.5 , Z20/1 ; 120   1   LUB         
1   20240801    PAYTM   15720.0 CASH MAN1   0   1   JRNLCR          
2   20240801    PAYTM   81343.0 CASH MAN2   0   2   JRNLCR

I can't get it to split "0.5 , Z20/1 ; 120" in column 7, 8, 9

What should I do for the desired result?

Solution

Input data and code issues

There's two problems with your example.

Your input has an issue with it. In column 4, row 1: "0.5 , Z20/1 ; 120" the second delimiter is a semicolon (";") instead of a comma (",").
If your raw data has these issues, then you may want to clean or preprocess it initially; for instance, converting semicolons to commas.
Your split dataframe has columns [0, 1, 2]. You'll want to modify the columns to match your output of [7, 8, 9].
You can do this with the set_axis method. As in:
```
splitDf = var002_ar1.iloc[:, 4].str.split(', ', n=2, expand=True).set_axis([7, 8, 9], axis=1)
```
Now you can set it as you did before:
```
var002_ar1.loc[var002_ar1.iloc[:, 6] == 'LUB', [7, 8, 9]] = splitDf
```
_{Note: with your input issue in 1., set_axis will error as the semicolon will result in the split dataframe having only two columns instead of three (Length mismatch). You will have to fix that first.}

Fixed results

Putting the new input and new code all together:

input:

20240801    CASH MAN1   120.00  Z20/1   0.5 , Z20/1 , 120   1   LUB
20240801    PAYTM   15720.00    CASH MAN1   0   1   JRNLCR
20240801    PAYTM   81343.00    CASH MAN2   0   2   JRNLCR

code:

import pandas as pd

var001_sf = input("Enter file name to import : ")
var002_sf = var001_sf + ".csv"
var002_ar1 = pd.read_csv(var002_sf, sep='\t', header=None)

var002_ar1 = var002_ar1.dropna(axis=0)  # remove rows with na value
var002_ar1.iloc[:, [0, 1, 2, 3, 4, 5, 6]] = var002_ar1.iloc[:, [0, 1, 2, 3, 4, 5, 6]].map(str)

splitDf = var002_ar1.iloc[:, 4].str.split(', ', n=2, expand=True).set_axis([7, 8, 9], axis=1)
var002_ar1.loc[var002_ar1.iloc[:, 6] == 'LUB', [7, 8, 9]] = splitDf

var002_ar1.to_csv("t250221.csv", sep='\t')

output:

    0   1   2   3   4   5   6   7   8   9
0   20240801    CASH MAN1   120.0   Z20/1   0.5 , Z20/1 , 120   1   LUB 0.5     Z20/1   120
1   20240801    PAYTM   15720.0 CASH MAN1   0   1   JRNLCR          
2   20240801    PAYTM   81343.0 CASH MAN2   0   2   JRNLCR