pythonjsoncsv

CSV to JSON using python, only selected columns, and creating new entry for each cell having multiple values from a single CSV cell?


I am very new to python and I am trying to get the following JSON output from a csv file and the csv file looks like this

date      customer Ex employer    emailID        CIN     CIN2            BatcheSource
9-Jul-24   ABC1    EmployerAnme1  abc1@abc1.com  123456  9087690,345678  payment
9-Oct-24   BCD1    EMP2           bcd1@bcd1.com  234566                  adasd

The JSON output I am looking for is

[
   {
      "Cin":"123456",
      "Date":"9-Jul-24",
      "Ex Employer":"Employer Name 1",
      "Batche Source":"Payment"
   },
   {
      "Cin":"9087690",
      "Date":"9-Jul-24",
      "Ex Employer":"Employer Name 1" 
      "Batche Source":"Payment"
},
   {
      "Cin":"345678",
      "Date":"9-Jul-24",
      "Ex Employer":"Employer Name 1", 
      "Batche Source":"Payment"
   },
   {
      "Cin":"234566",
      "Date":"9-Oct-24",
      "Ex Employer":"EMP2",
      "Batche Source":"adasd"
   }

]

The code so far I have tried is this one, but I'm not sure how to get new entries for comma separated multiple values of a cell:

import csv
import json

with open('test.csv') as infile:
    reader = csv.DictReader(infile)
    out = [{"CIN": row['CIN'],"Date": row["Date"], "Ex Employer": row["Ex Employer"],"CIN2": row["CIN2"],"Batche Source": row["Batche Source"]} for row in reader]

with open('test1.json', 'w') as outfile:
    json.dump(out, outfile) 

Solution

  • You'll want to split your CIN2 value into a list, and put that together with CIN, and create a new output for each of those CINs:

    out = []
    for row in reader:
        cins = [row['CIN'], *row['CIN2'].split(',')]
        for cin in cins:
            out.append({"CIN": cin, "Date": row["Date"], "Ex Employer": row["Ex Employer"], "Batche Source": row["Batche Source"]})
    

    You can condense that into a list comprehension, but I'd advise against it for readability.