pythonarrayspandaspyproj

Get clean Array output from pyproj


(if you don't need all details, please see short explanation below...) What I want to do: I'm currently trying to convert a bunch of coordinates from epsg:4326 to epsg: 25832. The conversion part works perfectly fine and I was able to save my array to excel with the help of pandas. Problem: I am now trying to add the old coordinates to the array for reference but I can't seem to get a clean output that I can save. My Code:

import pyproj as pp
import numpy as np
import os

print('Whats the name of the file you are trying to convert (e.g. "coordinates.xlsx")?')
xlsx_file_name = "Mappe1.xlsx" #input()

try:
    import_arr = pd.read_excel(os.path.join(os.getcwd(), xlsx_file_name), sheet_name='Tabelle1')
except:
    print(f'Were not able to find a file named {xlsx_file_name}. Please restart and make sure it is written correctly.')

try:
    lat_long_arr = np.array(import_arr)
except:
    print(f'Were not able to import the data from {xlsx_file_name}. Please make sure it is formated properly.')

xy_arr = []

#epsg:4326 to epsg:25832
epsg = pp.Transformer.from_crs(4326, 25832)

#THIS IS THE PART I STRUGGLE WITH-----------------------------
for i in range(len(lat_long_arr)):
        xy_arr.extend([epsg.transform(lat_long_arr[i,0], lat_long_arr[i,1]), lat_long_arr[i,0], lat_long_arr[i,1]])
#-------------------------------------------------------------

#print(xy_arr)

try:
    pd.DataFrame(xy_arr).to_excel('converted_coordinates.xlsx')
except:
    print('The file "converted_coordinates.xlsx" is already existing and could not be overwritten')

My Otput: [(562146.0879263151, 5407897.591536528), 48.82095040000001, 9.8466309, (562228.5649670326, 5407950.483323202), 48.8214179, 9.8477624, ...] Those are two pairs of coordinates. The first ones in the brakets are converted and the two numbers after them are the refference. Question: Does someone know how I can get rid of the brakets or put the other values inside the brakets? Keep in mind that I need the array to be saved with pd.DataFrame(xy_arr).to_excel('converted_coordinates.xlsx'). Thanks for any suggestions!

Short Version: I extend an array like this: array.extend([epsg.transform(x, y), lat, long]) and get an output e.g. like this [(562146.0879263151, 5407897.591536528), 48.82095040000001, 9.8466309]. I now need the array formated in a way to save it with pandas: pd.DataFrame(array).to_excel('save.xlsx')


Solution

  • We can simplify your code using vectorized numpy operations, eliminating the for-loop.

    Let's use your provided two sets of coordinates in the lat_long_arr array:

    import numpy as np
    import pyproj as pp
    
    lat_long_arr = np.array([[48.82095040000001, 9.8466309], [48.8214179, 9.8477624]])
    
    print(lag_long_arr)
    # array([[48.8209504,  9.8466309],
    #        [48.8214179,  9.8477624]])
    

    Now, we can pass the entire columns of the array to the epsg.transform() method, as it accepts arrays, not just scalars:

    converted_arr = epsg.transform(lag_long_arr[:, 0], lag_long_arr[:, 1])
    
    print(converted_arr)
    # (array([562146.08792632, 562228.56496703]),
    #  array([5407897.59153653, 5407950.4833232 ]))
    

    The result is a tuple of arrays, and we just need to manipulate it to get it into the right format:

    print(np.vstack(converted_arr).T)
    # array([[ 562146.08792632, 5407897.59153653],
    #        [ 562228.56496703, 5407950.4833232 ]])
    

    We can join the resulting array with the original array and pass the resulting 4-column array to the pd.DataFrame() constructor. You can use pandas options to display float values to the desired precision:

    pd.set_option('display.float_format', lambda x: '%.8f' % x)
    
    res = pd.DataFrame(
        np.hstack((np.vstack(converted_arr).T, lat_long_arr)),
        columns=['converted_x', 'converted_y', 'x', 'y']
    )
    
    print(res)
    #       converted_x        converted_y            x            y
    # 0 562146.08792632   5407897.59153653  48.82095040   9.84663090
    # 1 562228.56496703   5407950.48332320  48.82141790   9.84776240