htmlweb-scrapinghtml-tablervest

Scrape table using rvest - Embedded symbols/links


I tried to scrape the table on the following webpage: http://www.comstats.de/squad/1-FC+Bayern+München

My approach is successfull at first glance using the following code:

read_html("http://www.comstats.de/squad/1-FC+Bayern+München") %>% 
html_node("#inhalt > table.rangliste.autoColor.tablesorter.zoomable") %>%
html_table(header = TRUE, fill = TRUE)

However, in the second column there are differing number of linked symbols which lead to a corrupt table having different number of elements (which is why there is need for fill = TRUE).

I was researching for hours... Who can help me out?


Solution

  • In case someone is searching for an answer to such questions as well: One possible solution is to use package htmltable (https://cran.r-project.org/web/packages/htmltab/vignettes/htmltab.html):

    library(htmltab)
    
    htmltab(doc = "http://www.comstats.de/squad/1-FC+Bayern+München", which = '//*[@id="inhalt"]/table[2]')