pythonpandasgis

Split an integer cell into two columns based on digit location


I am dealing with thousands of csv GPS files which I want to plot in a GIS software. To do this I need decimal degrees, but the data was collected in degree decimal minutes. I have a function that converts the degree decimal minutes into the format I desire however the layout of the data in each csv file requires manipulation.

Here is an image of the dataframe which requires manipulation: Here is an image of the dataframe which requires manipulation

The format I would like columns 0 and 2 to be in is 57 10.7240 N split across three columns. As there is no whitespace between 57 and 10, and the decimal point is in the correct place I am struggling to find a way to split the integer at the 7 location and insert 57 as a new column in the dataframe. This could be done manually, but I have over 10,000 csv files with 600 rows each is not feasible.

Any help on how to split the integer from 5710.7242 into 57 10.7242 N would be really appreciated as it is the last part of my code to correct before the whole conversion works.


Solution

  • My solution is not very efficient in terms of iteration abuse, but you can try this:

    import pandas as pd
    
    data = [[5710.7240,'N',917.3222,'W'],[710.7239,'N',917.3225,'W']]
    df = pd.DataFrame(data, columns=['L2','NS','L4','EW'])
    
    split_L1, split_L2 = [],[]
    for i, row in df.iterrows():
        split_L1.append(int(int(df['L2'][i])/100))
        split_L2.append(round(df['L2'][i]-split_L1[i]*100, 5))
    df.insert(0,'L1',split_L1, True)
    df['L2'] = split_L2
    
    split_L3, split_L4 = [],[]
    for i, row in df.iterrows():
        split_L3.append(int(int(df['L4'][i])/100))
        split_L4.append(round(df['L4'][i]-split_L3[i]*100, 5))
    df.insert(3,'L3',split_L3, True)
    df['L4'] = split_L4
    
    df
    

    Input:

    input

    Output:

    output