sortingrecordsetopenrefine

openrefine - Sorting column contents within a record?


Scoured the internet as best as I could but couldn't find an answer -- I was wondering, is there some way to sort the contents of a row by record? E.g. take the following table:

Key Row to sort Other row
a bca A
cab
cab
abc f
b zyx
yxz u
c def h
fed h

and turn it into:

Key Row to sort Other row
a abc A
bca
cab
cab f
b yxz
zyx u
c def h
fed h

The ultimate goal is to sort all of the columns for each record alphabetically, and then blank up so that each record is a single row.

I've tried doing a sort on the column to sort within the record itself, but that orders records by whichever record has an entry that comes in alphabetical order (regardless of whether it's the 1st entry for the record or not, interestingly).


Solution

  • Here is a solution using sort

    Prerequisite: assuming that the values in the "Key" column are unique.

    1. Switch to rows mode
    2. Fill down the "Key" column via Key=> Edit cells => Fill down.
    3. Sort the "Key" column via Key=> Sort...
    4. Sort the "Row to sort" column via Row to sort => Sort... as additional sort
    5. Make the sorting permanent by selecting Reorder rows permanently in the sort menu.
    6. Blank down the "Key" and "Row to sort" column.

    Here is a solution using GREL

    As deduplicating and sorting records is quite a common task I have a GREL expression reducing this task to two steps:

    1. Transform the "Row to sort" column with the following GREL expression:
    if(
       row.index - row.record.fromRowIndex == 0,
       row.record.cells[columnName].value.uniques().sort().join(","),
       null
    )
    
    1. Split the multi-valued cells in the "Row to sort" column on the separator ,.

    The GREL expression will take all the record cells of the current column, extract their values into an array, make the values in the array unique, sort the remaining value in the array and join it into a string using , as separator.

    The joining into a string is necessary as OpenRefine currently has no support for displaying arrays in the GUI.