[SOLVED] Sorting CSV with IP address column in pandas

Sorting CSV with IP address column in pandas

I have a CSV file with a column of IP addresses, MAC addresses and some other data. I want to sort all of the data by the IP addresses in ascending order

Input:


| IP Address   | MAC Address | ID    |
| ---          | ---         | ---   |
| 10.10.10.1   | {mac1}      | {id1} |
| 10.10.10.112 | {mac2}      | {id2} |
| 10.10.10.17  | {mac3}      | {id3} |
| 10.10.10.7   | {mac4}      | {id4} |

What I expect the output to be:

| IP Address   | MAC Address | ID    |
| ---          | ---         | ---   |
| 10.10.10.1   | {mac1}      | {id1} |
| 10.10.10.7   | {mac4}      | {id4} |
| 10.10.10.17  | {mac3}      | {id3} |
| 10.10.10.112 | {mac2}      | {id2} |

Where the IP addresses are in proper ascending order with their corresponding MAC and ID

I have tried using the pandas Dataframe.sort_values() which without a key puts the IP addresses in incorrect ascending order because it looks at the IPs as integers and not as IP addresses. I have also tried to use sort_values() with the parameter key=ipaddress.IPv4Address but I just get a KeyError

df_new = df.sort_values(df['IP_Address'], key=ipaddress.IPv4Address)

I also tried sorted() which puts the IP addresses in the correct order and into a list and then when I put that back into the data frame it doesn't sort the corresponding MAC and ID for each IP

Solution

This this approach using Series.str.split and df.apply()

new_df = (df.sort_values(by=["IP Address"], key=lambda x: x.str.split(".")
               .apply(lambda y: [int(z) for z in y])))
print(new_df)

     IP Address MAC Address     ID
0    10.10.10.1      {mac1}  {id1}
3    10.10.10.7      {mac4}  {id4}
2   10.10.10.17      {mac3}  {id3}
1  10.10.10.112      {mac2}  {id2}