htmlrweb-scrapingrvest

Web scraping multiple tables on a page in R


I am trying to scrape all the data in multiple tables for the following link in R using rvest.

https://report-nle.dephub.go.id/dashboard/detail/27019186

When I try the following code, I am only able to see the contents of the first table.

tables <- read_html(url) %>% html_nodes(css = "table") %>% html_table()

[[1]]
# A tibble: 6 × 6
  X1                        X2    X3                    X4                     X5    X6                                               
  <chr>                     <chr> <chr>                 <chr>                  <chr> <chr>                                            
1 Nama Kapal                :     MV. BRILLIANT EXPRESS Nama Perusahaan        :     PT. LAJU DINAMIKA UTAMA                          
2 Nahkoda                   :     BANAL ELVIN REGALDO   Jenis Kapal            :     WOOD CHIP CARRIER                                
3 Bendara / Call Sign / IMO :     PA / 3ETN5 / 9502570  GT / DWT               :     40269 / 49802                                    
4 Panjang / Lebar           :     199.91 / 0            Draft Depan / Belakang :     6 / 8                                            
5 Draft Tengah / Max        :     / 11.787              Status NLE             :     Ya dengan nomor : 101018F3C3E66                  
6 Asal / Tujuan             :     RIZHAO PT / RIZHAO PT PKK / SPB              :     PKK.LN.IDBPN.2403.000682 / SPB.IDBPN.0324.0000951

[[2]]
# A tibble: 0 × 10
# ℹ 10 variables: No <lgl>, Nama Barang <lgl>, Jenis barang <lgl>, Unit/Ton/M3 <lgl>, No BL <lgl>, Shipper <lgl>, Consignee <lgl>,
#   B/M <lgl>, LHV <lgl>, NTPN <lgl>

[[3]]
# A tibble: 0 × 10
# ℹ 10 variables: No <lgl>, Nama Barang <lgl>, Jenis barang <lgl>, Unit/Ton/M3 <lgl>, No BL <lgl>, Shipper <lgl>, Consignee <lgl>,
#   B/M <lgl>, LHV <lgl>, NTPN <lgl>

[[4]]
# A tibble: 0 × 7
# ℹ 7 variables: Nomor PKK <lgl>, Pelabuhan <lgl>, ETA <lgl>, Asal <lgl>, ETD <lgl>, Tujuan <lgl>, Action <lgl>

[[5]]
# A tibble: 0 × 5
# ℹ 5 variables: Kode Billing <lgl>, Nama Wajib Bayar <lgl>, NTPN <lgl>, NTB <lgl>, Total <lgl>

How can I go about getting the contents of all the other tables as well?


Solution

  • Try read_html_live()

    From the docs:

    read_html() operates on the HTML source code downloaded from the server. This works for most websites but can fail if the site uses javascript to generate the HTML. read_html_live() provides an alternative interface that runs a live web browser (Chrome) in the background. This allows you to access elements of the HTML page that are generated dynamically by javascript and to interact with the live page by clicking on buttons or typing in forms.

    Also below, html_elements() is just a more recently added alias for html_nodes(). Works the same.

    library(rvest)
    url <- "https://report-nle.dephub.go.id/dashboard/detail/27019186"
    html <- read_html_live(url)
    
    # Using the html tag pulled more than I saw on the page
    tables.from.tag <- html_elements(html, css = "table")
    
    # This pulled the 5 visible ones I saw on the page
    tables.from.class <- html_elements(html, css = ".table")
    
    # Parse the tables
    html_table(tables.from.class)
    
    packageVersion("rvest")
    # [1] ‘1.0.4’