pythonpython-2.7vcf-vcardvobject

Compare two vcards


I have two vcards :

vcard1 = "BEGIN:VCARD
          VERSION:3.0
          N;CHARSET=UTF-8:Name;;;;
          TEL:0005555000
          END:VCARD"

vcard2 = "BEGIN:VCARD
      VERSION:3.0
      N;CHARSET=UTF-8:Name;;;;
      TEL:0005555000
      EMAIL;CHARSET=UTF-8:my_email@email.com
      END:VCARD"

As you can see the only difference is that the second vcard has an additional attribute which is EMAIL? Are these two vcards could be considered as equal using code ?

import vobject
print(vobject.readOne(vcard1).serialize()==vobject.readOne(vcard2).serialize())

Solution

  • Solution

    The first rule for any comparison is to define the basis of comparison. You can even compare apples and oranges, provided you are looking for a quantity that can be compared: such as "how many apples vs. oranges" or "weight of 5-apples vs. 5-oranges". The point being the definition of underlying basis of comparison must be unambiguous.

    Note: I will use the data from the Dummy Data section below.

    Extending this concept to your use-case, you can compare the vcards against each field and then also compare against all fields. For example, I have shown you three ways to compare them:

    Obviously, in this case if you compare the serialized versions of vcard1 and vcard2, it would return False as the content of these two vcards are different.

    vc1.serialize()==vc2.serialize() # False
    

    Example

    In each case (A1, A2, A3), the custom function compare_vcards() returns two things:

    But you will have to define your own business logic to determine what you consider as a match and what is not. What I have shown here should help you get started though.

    ## Example - A1
    #  Compare ONLY COMMON fields b/w vc1 and vc2
    match, summary = compare_vcards(vc1, vc2, mode='common')
    print(f'match:   \t{match}')
    print(f'summary: \t{summary}')
    
    ## Output
    # match:    {'n': True, 'tel': True, 'version': True}
    # summary:  {'abs_match': True, 'rel_match': 1.0}
    
    ## Example - A2
    #  Compare ALL fields b/w vc1 and vc2
    match, summary = compare_vcards(vc1, vc2, mode='all')
    print(f'match:   \t{match}')
    print(f'summary: \t{summary}')
    
    ## Output
    # match:    {'tel': True, 'email': False, 'n': True, 'version': True}
    # summary:  {'abs_match': False, 'rel_match': 0.75}
    
    ## Example - A3
    #  Compare ONLY COMMON USER-SPECIFIED fields b/w vc1 and vc2
    match, summary = compare_vcards(vc1, vc2, fields=['email', 'n', 'tel'])
    print(f'match:   \t{match}')
    print(f'summary: \t{summary}')
    
    ## Output
    # match:    {'email': False, 'n': True, 'tel': True}
    # summary:  {'abs_match': False, 'rel_match': 0.6666666666666666}
    

    Code

    def get_fields(vc1, vc2, mode='common'):
        if mode=='common':
            fields = set(vc1.sortChildKeys()).intersection(set(vc2.sortChildKeys()))
        else:
            # mode = 'all'
            fields = set(vc1.sortChildKeys()).union(set(vc2.sortChildKeys()))
        return fields
    
    def compare_vcards(vc1, vc2, fields=None, mode='common'):
        if fields is None:
            fields = get_fields(vc1, vc2, mode=mode) 
        match = dict(
            (field, str(vc1.getChildValue(field)).strip()==str(vc2.getChildValue(field)).strip()) 
            for field in fields
        )
        summary = {
            'abs_match': all(match.values()), 
            'rel_match': sum(match.values()) / len(match)
        }
        return match, summary
    

    Dummy Data

    vcard1 = """
    BEGIN:VCARD
    VERSION:3.0
    N;CHARSET=UTF-8:Name;;;;
    TEL:0005555000
    END:VCARD
    """
    
    vcard2 = """
    BEGIN:VCARD
    VERSION:3.0
    N;CHARSET=UTF-8:Name;;;;
    TEL:0005555000
    EMAIL;CHARSET=UTF-8:my_email@email.com
    END:VCARD
    """
    
    # pip install vobject
    import vobject
    
    vc1 = vobject.readOne(vcard1)
    vc2 = vobject.readOne(vcard2)
    

    References