pythonexcelxmlmimewebarchive

Is it possible in Python to convert the below strange .XLS file, which is actually in some HTML/XML format to .XLSX?


Quite puzzled by the format of these .xls files as they are not really .xls files, I've put the first few lines of the file below for reference, full file here.

Converting normal .xls is no problem with p.save_book_as(file_name=fname, dest_file_name=fname+'x').

I would like to convert to .xlsx in bulk with python, is this even possible with the below format?

MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db"

This document is a Single File Web Page, also known as a Web Archive file.  If you are seeing this message, your browser or editor doesn't support Web Archive files.  Please download a browser that supports Web Archive, such as Microsoft Internet Explorer.

------=_NextPart_86ab7b61_9054_45ca_a3a6_49bc8ebc61db
Content-Location: file:///C:/86ab7b61_9054_45ca_a3a6_49bc8ebc61db/Workbook.html
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">

<meta name=3DProgId content=3DExcel.Sheet>
<link rel=3DFile-List href=3D"Worksheets/filelist.xml">

<!--[if gte mso 9]><xml>
 <x:ExcelWorkbook>
  <x:ExcelWorksheets>
   <x:ExcelWorksheet>

Solution

  • This seems to be "Excel compatible HTML". While I do not know a pure python converter, you could try to use excel as an external converter, i.e. open those files and save them to xlsx, as described here and copied below. This requires the pywin32 package, to access excel remotely.

    import win32com.client as win32
    fname = "full+path+to+xls_file"
    excel = win32.gencache.EnsureDispatch('Excel.Application')
    wb = excel.Workbooks.Open(fname)
    
    wb.SaveAs(fname+"x", FileFormat = 51)    #FileFormat = 51 is for .xlsx extension
    wb.Close()                               #FileFormat = 56 is for .xls extension
    excel.Application.Quit()