So I'm trying to clean some phone records in a database table.
I've found out how to find exact matches in 2 fields using:
/* DUPLICATE first & last names */
SELECT
`First Name`,
`Last Name`,
COUNT(*) c
FROM phone.contacts
GROUP BY
`Last Name`,
`First Name`
HAVING c > 1;
Wow, great.
I want to expand it further to look at numerous fields to see if a phone number in 1 of 3 phone fields is a duplicate.
So I want to check 3 fields (general mobile
, general phone
, business phone
).
1.to see that they are not empty ('') 2.to see if the data (number) in any of them appears in the other 2 phone fields anywhere in the table.
So pushing my limited SQL past its limit I came up with the following which seems to return records with 3 empty phone fields & also records that don't have duplicate phone numbers.
/* DUPLICATE general & business phone nos */
SELECT
id,
`first name`,
`last name`,
`general mobile`,
`general phone`,
`general email`,
`business phone`,
COUNT(CASE WHEN `general mobile` <> '' THEN 1 ELSE NULL END) as gen_mob,
COUNT(CASE WHEN `general phone` <> '' THEN 1 ELSE NULL END) as gen_phone,
COUNT(CASE WHEN `business phone` <> '' THEN 1 ELSE NULL END) as bus_phone
FROM phone.contacts
GROUP BY
`general mobile`,
`general phone`,
`business phone`
HAVING gen_mob > 1 OR gen_phone > 1 OR bus_phone > 1;
Clearly my logic is flawed & I wondered if someone could point me in the right direction/take pity etc...
Many thanks
The first thing you should do shoot the person that named your columns with spaces in them.
Now then, try this:
SELECT DISTINCT
c.id,
c.`first name`,
c.`last name`,
c.`general mobile`,
c.`general phone`,
c.`business phone`
from contacts_test c
join contacts_test c2
on (c.`general mobile`!= '' and c.`general mobile` in (c2.`general phone`, c2.`business phone`))
or (c.`general phone` != '' and c.`general phone` in (c2.`general mobile`, c2.`business phone`))
or (c.`business phone`!= '' and c.`business phone` in (c2.`general mobile`, c2.`general phone`))
See a live demo of this query in SQLFiddle.
Note the extra check for phone != ''
, which is required because the phone numbers are not nullable, so their "unknown" value is blank. Without this check, false matches are returned because of course blank equals blank.
The DISTINCT
keyword was added in case there are multiple other rows that match, which would result in a nxn result set.