[SOLVED] How to identify non-identical values being aggregated with COUNT()?

How to identify non-identical values being aggregated with COUNT()?

I have a table containing three columns: city ID, language code and city:

city_id | language_code | city
----------------------------------
1       | en            | London
1       | es            | Londres
1       | pt            | Londres
2       | de            | Köln
2       | en            | Cologne
3       | it            | Cologne

In some foreign languages cities can be spelled the same way, e.g. the Spanish and Portuguese name for London is Londres.

But there are cases where the same name can refer to completely different locations, e.g. Cologne is an English name for the German city of Köln but there's also a town of the same name in Italy.

I would like to be able to detect cities that have more than one entry in the table but only those that are linked to different city_id values. In my case this would be Cologne but not Londres as both es and pt language versions point to the same city_id.

I thought this would be a fairly easy thing to do but I haven't been able to get the results in a single query. Instead, I am deduping the results first and then aggregating them:

WITH deduped_cities AS (
  SELECT DISTINCT city, city_id
  FROM cities
  ORDER BY city
)
SELECT city, COUNT(city_id) AS total
FROM deduped_cities
GROUP BY city
HAVING COUNT(city_id) > 1;

This gives me the expected result:

city    | total
----------------
Cologne | 2

I was just wondering if it is possible to achieve the same effect with a single SELECT statement.

DB Fiddle

Solution

you have basically just one SELECT, but you can use DISTINCT' with the COUNT

SELECT city, COUNT(city_id) AS total
FROM cities
GROUP BY city
HAVING COUNT( DISTINCT city_id) > 1;

See fiddle