pythonjsonpython-3.xexcelopenpyxl

How to convert Excel file with multiple header and sub header to Nested Json as show


enter image description here

i have tried to convert the excel file data as shown in the picture from excel to Json format but not able to properly convert to the structure that i required, the structure I required is as below

{
  "ID": "000 001 234",
  "Label": "XYZ",
  "Category 1": { "Sub1": "129.75","Sub2" : "0.00","Sub3":"0.00" }
  "Order Amount": "234.00",
  "Penalty": "111.00",
  "Fees": "3,456.00",
  "Category2":  { "Sub21": "0.00","Sub22" : "0.00","Sub23": "0.00" }
  "Invoice": "11.00"
},

{
  "ID": "000 001 235",
  "Label": "XYZ",
  "Category 1": { "Sub1": "1.75","Sub2" : "0.00","Sub3":"0.00" }
  "Order Amount": "111.00",
  "Penalty": "0.00",
  "Fees": "2,343.00",
  "Category2":  { "Sub21": "0.00","Sub22" : "0.00","Sub23": "0.00" }
  "Invoice": "2.00"
},

code i tried is as below

from openpyxl import load_workbook
from json import dumps

# Load Excel workbook
wb = load_workbook("sample.xlsx")

# Choose a specific sheet
sheet = wb["Sheet1"]

# Find the number of rows and columns in the sheet
rows = sheet.max_row
columns = sheet.max_column
# List to store all rows as dictionaries
lst = []

# Iterate over rows and columns to extract data
for i in range(1, rows):
    row = {}
    for j in range(1, columns):
        column_name = sheet.cell(row=1, column=j)
        row_data = sheet.cell(row=i+1, column=j)

        row.update(
            {
                column_name.value: row_data.value
            }
        )
    lst.append(row)

# Convert extracted data into JSON format
json_data = dumps(lst)

# Print the JSON data
print(json_data)

Output I am getting is as below

[{
  "ID": null,
  "Label": null,
  "Category 1": "Sub3",
  "Order Amount": null,
  "Penalty": null,
  "Fees": null,
  "Category2": "Sub23"
}, 
{"ID": 1234, "Label": "XYZ", "Category 1": 0, "Order Amount": 234, "Penalty": 111, "Fees": 3456, "Category2": 0}, 
{"ID": 1235, "Label": "XYZ", "Category 1": 0, "Order Amount": 111, "Penalty": 0, "Fees": 2343, "Category2": 0}]

I am not able to get the nested json in the proper format that i actually require, Any help would be appreciated.


Solution

  • You can handle multi-row headers easily with pandas, which automatically detects hierarchical columns.
    To get exactly your desired nested JSON structure you can do this:

    import pandas as pd
    import json
    
    # Read Excel file with two header rows
    df = pd.read_excel("sample.xlsx", header=[0, 1])
    
    records = []
    for _, row in df.iterrows():
        record = {}
        for (main, sub), value in row.items():
            # If the subheader is empty or "Unnamed", treat it as a top-level key
            if pd.isna(sub) or str(sub).startswith("Unnamed"):
                record[main] = str(value)
            else:
                record.setdefault(main, {})[sub] = str(value)
        records.append(record)
    
    # Convert to JSON
    print(json.dumps(records, indent=2, ensure_ascii=False))
    

    Output:

    [
      {
        "ID": "000 001 234",
        "Label": "XYZ",
        "Category 1": {"Sub1": "129.75", "Sub2": "0.00", "Sub3": "0.00"},
        "Order Amount": "234.00",
        "Penalty": "111.00",
        "Fees": "3,456.00",
        "Category2": {"Sub21": "0.00", "Sub22": "0.00", "Sub23": "0.00"},
        "Invoice": "11.00"
      },
      {
        "ID": "000 001 235",
        "Label": "XYZ",
        "Category 1": {"Sub1": "1.75", "Sub2": "0.00", "Sub3": "0.00"},
        "Order Amount": "111.00",
        "Penalty": "0.00",
        "Fees": "2,343.00",
        "Category2": {"Sub21": "0.00", "Sub22": "0.00", "Sub23": "0.00"},
        "Invoice": "2.00"
      }
    ]
    

    Explanation:

    This produces exactly the nested JSON format you want.