I'm looking for an up to date way of determining whether or not text inside an Excel xlsx workbook cell is struck-through using Python on Linux and Mac.
I understand that Openpyxl will determine if a cell has the struck-through font-type active, but it will only differentiate if the font-type is set at the cell level not at the text level within the cell, which is why I'm stuck.
I've also read that I can import xlrd v1.2 though I've not tested this as it's not something I want to do if I can help it given that 1.2 is an old out of support version.
Is there any way that I'm missing of using Python to differentiate struck-through and non-struck-though text within in excel xlsx workbook cell?
Code tested so far on Mac:
main_xlsx = "test_wb.xlsx"
wb = openpyxl.load_workbook(main_xlsx, rich_text=True)
ws = wb["Sheet1"]
for i in range(1, ws.max_row+1):
my_cell = ws["C"+str(i)]
if type(my_cell.value) == CellRichText:
for text in my_cell.value:
if text.font.stike:
print(text)
else:
print("no rich text")
------------------------------
Traceback (most recent call last):
File "test_main.py", line 17, in <module>
if text.font.stike:
^^^^^^^^^
AttributeError: 'str' object has no attribute 'font'. Did you mean: 'count'?
Openpyxl is a good choice if you're using Linux. MACOS can use Xlwings but that module would be a problem on Linux.
Fundamentally your code would work, though the typo suggests this is not what you have in your code;
if text.font.stike:
However there is an issue with how your code operates.
A cell containing rich text will be separated into blocks where each rich text block is of type TextBlock
when rich_text=True
is enabled. The TextBlock
object has two main attributes, font
and text
.
text
contains the characters that are only in that TextBlock
font
contains all the font attributes including the 'strike' and 'strikethrough' attributes.
However, not all blocks may be of this type but rather some can be type str
and a string does not have an attribute called font
.
If the cell contains the text;
this is a cell with rich text that includes two TextBlocks
with text
, 'This' and ' is Strikethrough' because the first character in the cell is rich text. As these are both TextBlocks
they both have font
as an attribute.
If the cell contains the text;
this is also a cell with rich text but this time it includes only one TextBlock
which contains just the text
'Strikethrough'.
Since the characters before this text, 'This is ' are not themselves rich text, they are not placed in a TextBlock
and are in fact just a string, str
.
And when you attempt to check the font attribute on the string you get the error;
AttributeError: 'str' object has no attribute 'font
So any leading non rich text in the cell is treated as a string.
But at the first rich text character then all characters, rich text or not, from then on will be in a TextBlock
based on the rich text (or not) applied to the characters.
You need to change your code to check if text
is a TextBlock
or str
before testing the font attribute.
Example
import openpyxl
from openpyxl.cell.rich_text import CellRichText, TextBlock
main_xlsx = "test_wb.xlsx"
wb = openpyxl.load_workbook(main_xlsx, rich_text=True)
ws = wb["Sheet1"]
for my_cell in ws['C']:
if type(my_cell.value) == CellRichText:
for text in my_cell.value:
# Is 'text' type str or type TextBlock
if isinstance(text, TextBlock):
if text.font.strike:
print(text)
else:
print("no rich text")