pythonexcelarcpy

How to split a row with multiple sets of data in different columns into mutiple rows


Everyone! I have an issue with the data collected from the field, where every single row has multiple sets of data in different columns (seen Input). I tried to split each set of data into a row (seen Expected Output).

Input:

Row1: ID A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4 Name1 Name2 Name3 Name4 Date

Expected Output:

Row1: ID A1 B1 C1 D1 Name1 Date

Row2: ID A2 B2 C2 D2 Name2 Date

Row3: ID A3 B3 C3 D3 Name3 Date

Row4: ID A4 B4 C4 D4 Name4 Date

I found there is a similar post, but it is using VBA Scripting. I am a GIS guy and I have some experience in the ArcPy site package of python. I have some trouble constructing a logical workflow using ArcPy. I was wondering does anyone has experienced a similar situation and know if this could be done using Python and ArcPy or any other site-packages. If so, could you please provide me with some hints on what site package and tools that I should look into?


Solution

  • One way of doing this with arcpy is using cursor. Create a new table, iterate throught your data, split them and insert each line in the new table.

    This is using arcpy from ArcMap (not ArcGIS Pro) :

    import arcpy
    import os
    
    src_data = ''  # Path to your data
    dst_data = ''  # New table that will be create. 
    arcpy.CreateTable_management(os.path.dirname(dst_data),
                                 os.path.basename(dst_data))
    
    # Create the correct fields in the new table
    # Assuming the fields have the same name as your data in Row1
    # Also assuming the A1, A2, A3 fields .... have the same type/scale/precision/length/etc
    # Index split table (index from the Row1 you gave in your example)
    dst_idx_split = [[0, 1, 5, 9, 13, 17, 21],  # ID A1 B1 C1 D1 Name1 Date
                     [0, 2, 6, 10, 14, 18, 21], # ID A2 B2 C2 D2 Name2 Date
                     [0, 3, 7, 11, 15, 19, 21], # etc.
                     [0, 4, 8, 12, 16, 20, 21]]
    
    src_fields = arcpy.ListFields(src_data)
    for idx in dst_idx_split[0]:
        field = src_fields[idx]
        field_name = field.baseName if field_name in ['ID', 'Data'] else field_name[:-1]  # remove the 1/2/3/4 in the field name
        arcpy.AddField_management(dst_data, field_name, field.type, field.precision, 
                                  field.scale, field.length, field.alias, 
                                  field.isNullable, field.required, field.domain)
    
    # Copy the data
    i_cur = arcpy.da.InsertCursor(dst_data, '*')
    
    with arcpy.da.SearchCursor(src_data, '*') as s_cur:
        for row in s_cur:  # for each row of your source data
            for idxs in dst_idx_split: 
                dst_row = [row[idx] for idx in idxs] # extract the data for the new line 
                i_cur.insertRow(dst_row)  # insert it in new table
    del i_cur