Recently I am trying to parse data from Excel sheet using Python and I successfully parsed it but I don't need some rows from that Excel sheet. So how do I do it(may be using looping)? Here the code which I wrote to parse the Excel sheet:
import xlrd
book = xlrd.open_workbook("Excel.xlsx")
sheet = book.sheet_by_index(0)
firstcol = sheet.col_values(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in
range(sheet.nrows)]
ele=''
year=[]
for j in range(len(data)):
if j==1:
year=data[j]
if j>2:
ele=data[j][0]
for i in range(1, len(data[j])):
if ele != "":
if data[j][i] != "":
if year[i] !="":
print([ele, data[j][i], year[i]])
With that all rows are parsing in list format which I want, but I don't want some rows**( Like Total age, Total IDs, Total Result)** from Excel file, So how can I implement it in the same code or suggest some other effective way(may be pandas) to reduce code or any powerful way. The Excel file to which I'm referring: Click to see Excel.xlsx
Thanks in Advance...
If I understand correctly, you can do this much more simply. You have some list of rows to exclude:
rows_to_exclude = ['Total age', 'Total IDS', 'Total Result']
You can read in the dataframe using pd.read_excel without xlrd (no need to specify the sheet index if it's the first sheet, which is read by default). Then you can drop the rows with missing values, and drop all rows whose index is in your list of excluded row labels:
df = pd.read_excel('Excel.xlsx')
df = df.dropna().drop(rows_to_exclude)