pythonhtmlhtml-tableexport-to-csv

Script that converts html tables to CSV (preferably python)


I have a large number of html tables that I'd like to convert into CSV. Pasting individual tables into excel and saving them as .csv works, as does pasting the html tables into simple online converters. But I have thousands of individual tables, so I need a script that can automate the conversion process.

I was wondering if anyone has any suggestions as to how I could go about doing this? Python is the only language I have a decent knowledge of, so some sort of python script would be ideal. I've searched for similar questions, but all the python examples I've found are quite complicated to me, and go beyond my basic level of understanding.

Any advice would be much appreciated.


Solution

  • Use pandas. It has a function to read html tables into a data structure, and then a function that will write that data structure to a csv file.

    import pandas as pd
    url = 'http://myurl.com/mypage/'
    
    for i, df in enumerate(pd.read_html(url)):
        df.to_csv('myfile_%s.csv' % i)
    

    Note that since an html page may have more than one table, the function to get the table always returns a list of tables (even if there is only one table). That is why I use a loop here.