I'm trying to understand this example at https://dedupeio.github.io/dedupe-examples/docs/mysql_example.html. How is having < comparison different than != in following case?
read_cur.execute("""
select a.donor_id,
json_object('city', a.city,
'name', a.name,
'zip', a.zip,
'state', a.state,
'address', a.address),
b.donor_id,
json_object('city', b.city,
'name', b.name,
'zip', b.zip,
'state', b.state,
'address', b.address)
from (select DISTINCT l.donor_id as east, r.donor_id as west
from blocking_map as l
INNER JOIN blocking_map as r
using (block_key)
**where l.donor_id < r.donor_id) ids**
INNER JOIN processed_donors a on ids.east=a.donor_id
INNER JOIN processed_donors b on ids.west=b.donor_id
""")
VS
read_cur.execute("""
select a.donor_id,
json_object('city', a.city,
'name', a.name,
'zip', a.zip,
'state', a.state,
'address', a.address),
b.donor_id,
json_object('city', b.city,
'name', b.name,
'zip', b.zip,
'state', b.state,
'address', b.address)
from (select DISTINCT l.donor_id as east, r.donor_id as west
from blocking_map as l
INNER JOIN blocking_map as r
using (block_key)
**where l.donor_id != r.donor_id) ids**
INNER JOIN processed_donors a on ids.east=a.donor_id
INNER JOIN processed_donors b on ids.west=b.donor_id
""")
When you use <
you are assured that every pair will be included only once.
l.donor_id r.donor_id
1 2
3 4
Whereas if you use !=
every pair will be included twice.
l.donor_id r.donor_id
1 2
2 1
3 4
4 3