sqldistinctspark-window-function

To find distinct rows even the column values are interchanged in SQL


I have a table like this:

strt           end         no of km
------------------------------------
California     India         9000
Singapore      India         3000
India          Frankfurt     6700
Frankfurt      Singapore     4500
India          Indonesia     3000
India          California    9000
India          Singapore     3000
Frankfurt      India         6700
Singapore      Frankfurt     4500
Indonesia      India         3000

I need to get distinct rows from this by considering start and end point (we should get single entry even If we interchange the start and end values ). I tried concat the columns start and end as a list and sort the elements in the list and then applied distinct.

Is there any SQL query for this? Thanks in advance.

Output should be like this

strt          end       no of km
-------------------------------
California    India      9000
Singapore     India      3000
India         Frankfurt  6700
Frankfurt     Singapore  4500
India         Indonesia  3000

Solution

  • Welcome to SO!

    The column could not be named end since that is a reserve word in SQL, so I used dest instead. And with that I could create this query that do solve your problem:

    select
        distinct
        case when strt>dest then dest else strt end as strt,
        case when strt>dest then strt else dest end as dest,
        nr_of_km
    from data
    order by 1,2;
    

    It will swap strt and dest if needed and and then use distinct.

    NOTE: If there is an error in the data so that the distance differs between the directions for a pair then the query will produce two rows for that pair.

    NOTE 2: The sorting distincts between upper- and lower-case characters. So if you need the query to treat a and A as the same character when comparing you could do something like this:

    case when lower(strt)>lower(dest) then dest else strt end as strt,
    ...