pythonpandasexcelopenpyxlstrikethrough

Pandas read excel and flag rows with striketrough


I have to work on an Excel sheet with python so I can later put it in a database.

My problem is, that I have quite a lot of rows/cells with striketrough. I want to read the excel normally and flag every row with one or more striketrough cells. I just found a solution for skipping these rows but I want to flag it.

I did find a way to print the strikethrough rows using openpyxl but I wanted to use pandas because I have to add and cut a lot of columns later on and pandas is best for that.


Solution

  • It's hard to imagine how looks like your spreadsheet but here is an example to give you the general logic. Basically, we use Font.strikethrough from to check if a the font of a cell is striked or not.

    from openpyxl import load_workbook
    
    ws = load_workbook("file.xlsx")["Sheet1"]
    
    data = [[cell for cell in row] for row in ws.iter_rows(values_only=True)]
    
    df = pd.DataFrame(data[1:], columns=data[0])
    
    df["Flag"] = [
        any(cell.font.strikethrough for cell in row) for row in ws.iter_rows(min_row=2)
    ]
    

    Output :

    print(df)
    
      col1  col2   col3   Flag
    0  foo     1  test1   True
    1  bar     2  test2  False
    2  qux     3  test3   True
    3  baz     4  test4  False
    

    Input used :

    enter image description here