I have a table containing three columns: city ID, language code and city:
city_id | language_code | city
----------------------------------
1 | en | London
1 | es | Londres
1 | pt | Londres
2 | de | Köln
2 | en | Cologne
3 | it | Cologne
In some foreign languages cities can be spelled the same way, e.g. the Spanish and Portuguese name for London is Londres
.
But there are cases where the same name can refer to completely different locations, e.g. Cologne
is an English name for the German city of Köln but there's also a town of the same name in Italy.
I would like to be able to detect cities that have more than one entry in the table but only those that are linked to different city_id
values. In my case this would be Cologne
but not Londres
as both es
and pt
language versions point to the same city_id
.
I thought this would be a fairly easy thing to do but I haven't been able to get the results in a single query. Instead, I am deduping the results first and then aggregating them:
WITH deduped_cities AS (
SELECT DISTINCT city, city_id
FROM cities
ORDER BY city
)
SELECT city, COUNT(city_id) AS total
FROM deduped_cities
GROUP BY city
HAVING COUNT(city_id) > 1;
This gives me the expected result:
city | total
----------------
Cologne | 2
I was just wondering if it is possible to achieve the same effect with a single SELECT statement.
you have basically just one SELECT
, but you can use DISTINCT
' with the COUNT
SELECT city, COUNT(city_id) AS total
FROM cities
GROUP BY city
HAVING COUNT( DISTINCT city_id) > 1;
See fiddle