pythonexcelpandascomparexlrd

Compare 2 excel files using Python


I have two xlsx files as follows:

value1   value2   value3
0.456   3.456    0.4325436
6.24654 0.235435 6.376546
4.26545 4.264543 7.2564523

and

value1   value2  value3
0.456   3.456    0.4325436
6.24654 0.23546  6.376546
4.26545 4.264543 7.2564523

I need to compare all cells, and if a cell from file1 != a cell from file2 print that.

import xlrd
rb = xlrd.open_workbook('file1.xlsx')
rb1 = xlrd.open_workbook('file2.xlsx')
sheet = rb.sheet_by_index(0)
for rownum in range(sheet.nrows):
    row = sheet.row_values(rownum)
    for c_el in row:
        print c_el

How can I add the comparison cell of file1 and file2 ?


Solution

  • The following approach should get you started:

    from itertools import zip_longest
    import xlrd
    
    rb1 = xlrd.open_workbook('file1.xlsx')
    rb2 = xlrd.open_workbook('file2.xlsx')
    
    sheet1 = rb1.sheet_by_index(0)
    sheet2 = rb2.sheet_by_index(0)
    
    for rownum in range(max(sheet1.nrows, sheet2.nrows)):
        if rownum < sheet1.nrows:
            row_rb1 = sheet1.row_values(rownum)
            row_rb2 = sheet2.row_values(rownum)
    
            for colnum, (c1, c2) in enumerate(zip_longest(row_rb1, row_rb2)):
                if c1 != c2:
                    print("Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2))
        else:
            print("Row {} missing".format(rownum+1))
    

    This will display any cells which are different between the two files. For your given two files, this will display:

    Row 3 Col 2 - 0.235435 != 0.23546
    

    If you prefer cell names, then use xlrd.formular.colname():

    print "Cell {}{}  {} != {}".format(rownum+1, xlrd.formula.colname(colnum), c1, c2)
    

    Giving you:

    Cell 3B  0.235435 != 0.23546