I am currently trying to extract cells' background color from a xlsx file: image
I've tried two ways obtained from other stackoverflow posts:
1)
wb = load_workbook(excel_file, data_only = True)
sh = wb[wb.sheetnames[0]]
rows = sh.max_row
cols = sh.max_column
bckg_color = np.empty(shape=(rows, cols), dtype=object)
for i in range(1,rows+1):
for j in range(1,cols+1):
cell = sh.cell(column=j, row=i)
color_in_hex = cell.fill.start_color.index
bckg_color[i-1, j-1] = str(color_in_hex)
pd.DataFrame(bckg_color)
sf = StyleFrame.read_excel(excel_file, read_style=True, use_openpyxl_styles=False, header=None)
bckg_color = StyleFrame(sf.applymap(lambda cell: cell.style.bg_color)).data_df.astype(str)
bckg_color
Both of them give the same result: df screenshot
The expected result was the same color on the 4th row, but it's not because of merged cells in that row. Is there a robust way (expect using bfill on the color dataframe) to get the colors so that the whole row would have the color I see on the screenshot? I suspect it can be done with getting information about merged cells from openpyxl, but I would not like to resort to that.
I don't think you can avoid dealing with merged cells via openpyxl
. But it's quite simple, just check whether the cell belongs to a merged range or not:
for i in range(1,rows+1):
for j in range(1,cols+1):
cell = None
for mcl in sh.merged_cells:
if mcl.min_row <= i <= mcl.max_row and mcl.min_col <= j <= mcl.max_col:
cell = mcl.start_cell # that's where the background info is stored
break
if cell is None: cell = sh.cell(column=j, row=i)
color_in_hex = cell.fill.start_color.index
bckg_color[i-1, j-1] = str(color_in_hex)