pythonpandasstringcsvsplit

How can I split a DataFrame column into different columns?


I have below CSV file

timestamp,store,customer_name,basket_items,total_price,cash_or_card,card_number
06/06/2022 09:00,Chesterfield,Stephanie Neyhart,"Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",8.95,CASH,

I want to split basket_items like

product                                   price 
Large Flat white                          2.45
Large Flavoured iced latte - Vanilla      3.25
Large Flavoured iced latte - Hazelnut     3.25

How can I do that with pandas dataframe?


Solution

  • We simply need to split that column twice; once on comma and explode it; that would put each item in its own row. Then split again in ' - ' to separate the product name and its price into separate columns.

    #data
    df = pd.DataFrame([{
        'timestamp': '06/06/2022 09:00',
        'store': 'Chesterfield',
        'customer_name': 'Stephanie Neyhart',
        'basket_items': "Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",
        'total_price': 8.95,
        'cash_or_card': 'CASH'}])
    # split by comma and explode (to separate products into multi-rows)
    # split by dash once from the right side to separate product from price
    res = (
        df['basket_items'].str.split(', ')
        .explode()
        .str.rsplit(' - ', n=1, expand=True)
    )
    # set column names
    res.columns = ['product', 'price']
    

    enter image description here