pythonexcellinuxmacosopenpyxl

Dealing with struck-through and non-struck-through characters using Python on Linux and/or MacOS


I'm looking for an up to date way of determining whether or not text inside an Excel xlsx workbook cell is struck-through using Python on Linux and Mac.

I understand that Openpyxl will determine if a cell has the struck-through font-type active, but it will only differentiate if the font-type is set at the cell level not at the text level within the cell, which is why I'm stuck.

I've also read that I can import xlrd v1.2 though I've not tested this as it's not something I want to do if I can help it given that 1.2 is an old out of support version.

Is there any way that I'm missing of using Python to differentiate struck-through and non-struck-though text within in excel xlsx workbook cell?

Code tested so far on Mac:

main_xlsx = "test_wb.xlsx"
wb = openpyxl.load_workbook(main_xlsx, rich_text=True)
ws = wb["Sheet1"]

for i in range(1, ws.max_row+1):
    my_cell = ws["C"+str(i)]
    if type(my_cell.value) == CellRichText:
        for text in my_cell.value:
            if text.font.stike:
                print(text)
    else:
        print("no rich text")

------------------------------

Traceback (most recent call last):
    File "test_main.py", line 17, in <module>
      if text.font.stike:
         ^^^^^^^^^
AttributeError: 'str' object has no attribute 'font'. Did you mean: 'count'?

Solution

  • Openpyxl is a good choice if you're using Linux. MACOS can use Xlwings but that module would be a problem on Linux.
    Fundamentally your code would work, though the typo suggests this is not what you have in your code;

    if text.font.stike:
    

    However there is an issue with how your code operates.

    A cell containing rich text will be separated into blocks where each rich text block is of type TextBlock when rich_text=True is enabled. The TextBlock object has two main attributes, font and text.
    text contains the characters that are only in that TextBlock
    font contains all the font attributes including the 'strike' and 'strikethrough' attributes.

    However, not all blocks may be of this type but rather some can be type str and a string does not have an attribute called font.

    If the cell contains the text;
    Example cell 1
    this is a cell with rich text that includes two TextBlocks with text, 'This' and ' is Strikethrough' because the first character in the cell is rich text. As these are both TextBlocks they both have font as an attribute.

    If the cell contains the text;
    Example cell 2
    this is also a cell with rich text but this time it includes only one TextBlock which contains just the text 'Strikethrough'.
    Since the characters before this text, 'This is ' are not themselves rich text, they are not placed in a TextBlock and are in fact just a string, str.
    And when you attempt to check the font attribute on the string you get the error;

    AttributeError: 'str' object has no attribute 'font
    

    So any leading non rich text in the cell is treated as a string.
    But at the first rich text character then all characters, rich text or not, from then on will be in a TextBlock based on the rich text (or not) applied to the characters.

    You need to change your code to check if text is a TextBlock or str before testing the font attribute.

    Example

    import openpyxl
    from openpyxl.cell.rich_text import CellRichText, TextBlock
    
    
    main_xlsx = "test_wb.xlsx"
    wb = openpyxl.load_workbook(main_xlsx, rich_text=True)
    ws = wb["Sheet1"]
    
    for my_cell in ws['C']:
        if type(my_cell.value) == CellRichText:
            for text in my_cell.value:
                # Is 'text' type str or type TextBlock
                if isinstance(text, TextBlock):
                    if text.font.strike:
                        print(text)
        else:
            print("no rich text")