pythoncsvnatural-sortnatsort

Sort list of dictionaries based on relative time


I have a CSV file that I create from an API response. I get it unsorted and the API response does not have a query parameter to sort the returned payload. The CSV looks like below, where column A has some device identifiers and column B has the configuration last sync of these devices. The time is represented in relative time as seen in the screenshot.

CSV file

How can I naturally sort the CSV file to have the rows ordered in the following sequence seconds, minutes, hours, days, weeks, months, years, and Never?

I have tried to sort the entries using natsort but I didn't get what I am trying to achieve:

import csv
import os

from natsort import natsort, natsort_keygen

natsort_key = natsort_keygen(key=lambda k: k["last_sync"])

fieldnames = ["device", "last_sync"]
with open("Devices_161846.csv", "rt") as csvfile:
    csvreader = csv.DictReader(f=csvfile, fieldnames=fieldnames)
    next(csvreader)
    identities = list(csvreader)
    sorted_identities = natsort.natsorted(identities, key=natsort_key)

new_file, ext = os.path.splitext(csvfile.name)
with open(f"{new_file}_Sorted.{ext}", "wt") as f:
    csvwriter = csv.DictWriter(f=f, fieldnames=fieldnames)
    csvwriter.writeheader()
    csvwriter.writerows(sorted_identities)

I expect to get the same result, but sorted from newest to oldest.


Solution

  • I would use parse from as a sorting key, this way :

    # pip install dateparser
    import csv, datetime, dateparser
    
    def parse(st):
        if st == "Never": return datetime.datetime.min
        else:
            if (dt:=dateparser.parse(st.replace("a few", "0"))):
                return dt
            else:
                return datetime.datetime.now()
    
    header = ["device", "last_sync"]
    
    with (
        open("/Devices_161846.csv", "r") as inpf,
        open("Devices_161846_Sorted.csv", "w", newline="") as outf
    ):
        reader = csv.DictReader(inpf, fieldnames=header)
        writer = csv.DictWriter(outf, fieldnames=header)
        
        writer.writeheader()
        writer.writerows(
            sorted(reader, key=lambda x: parse(x["last_sync"]), reverse=True)
        )
    

    Output (Devices_161846_Sorted.csv) :

    device,last_sync
    j,a few seconds ago
    f,29 minutes ago
    g,38 minutes ago
    c,an hour ago
    a,an hour ago
    d,13 hours ago
    h,a day ago
    e,4 days ago
    b,17 days ago
    i,Never
    

    Input used (Devices_161846.csv) :

    a,an hour ago
    b,17 days ago
    c,an hour ago
    d,13 hours ago
    e,4 days ago
    f,29 minutes ago
    g,38 minutes ago
    h,a day ago
    i,Never
    j,a few seconds ago