rweb-scraping

Web scraping using R to get price report


I need to get the "price report" from https://fcainfoweb.nic.in/reports/report_menu_web.aspx for each day of the months of February - March 2020. I tried opening these data from the website by plugging in the date filter of the website. Each of them open as a .aspx but I'm unable to figure out why there are no differences in their urls. I need to write a code in R to save each of these as a separate excel sheet. I truly appreciate any help in this regard.


Solution

  • Here is an approach that can be considered :

    library(rvest)
    library(RDCOMClient)
    url <- "https://fcainfoweb.nic.in/reports/report_menu_web.aspx"
    IEApp <- COMCreate("InternetExplorer.Application")
    IEApp[['Visible']] <- TRUE
    IEApp$Navigate(url)
    Sys.sleep(5)
    doc <- IEApp$Document()
    
    clickEvent <- doc$createEvent("MouseEvent")
    clickEvent$initEvent("click", TRUE, FALSE)
    
    web_Obj <- doc$querySelector('#ctl00_MainContent_Rbl_Rpt_type > tbody > tr:nth-child(1) > td > label')
    web_Obj$dispatchEvent(clickEvent)
    Sys.sleep(2)
    
    web_Obj_Select <- doc$querySelector('#ctl00_MainContent_Ddl_Rpt_Option0')
    web_Obj_Select[["Value"]] <- "Daily Prices"
    web_Obj_Select$FireEvent("onchange")
    Sys.sleep(2)
    
    web_Obj_Date <- doc$querySelector('#ctl00_MainContent_Txt_FrmDate')
    web_Obj_Date[["Value"]] <- "14/09/2022"
    Sys.sleep(2)
    
    web_Obj_Get_Data <- doc$querySelector('#ctl00_MainContent_btn_getdata1')
    web_Obj_Get_Data$dispatchEvent(clickEvent)
    
    Sys.sleep(120)
    
    html_Content <- doc$documentElement()$innerHtml()
    read_html(html_Content) %>% rvest::html_table()
    
    [[1]]
    # A tibble: 276 x 6,177
       X1    X2    X3    X4    X5    X6    X7    X8    X9    X10   X11   X12   X13   X14   X15   X16   X17   X18   X19   X20   X21   X22   X23   X24  
       <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
     1 ""    NA    NA    NA    NA     NA   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
     2 "Dep~ Depa~ Date~ Dail~ Unit~ "Cen~ Cent~ Rice  Wheat Atta~ Gram~ Tur/~ Urad~ Moon~ Maso~ Sugar Milk~ Grou~ Must~ Vana~ Soya~ Sunf~ Palm~ Gur  
     3 "Dep~ Date~ Dail~ Unit~ NA     NA   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
     4 "Dat~ Dail~ Unit~ NA    NA     NA   NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   
     5 "Cen~ Cent~ Rice  Wheat Atta~ "Gra~ Tur/~ Urad~ Moon~ Maso~ Sugar Milk~ Grou~ Must~ Vana~ Soya~ Sunf~ Palm~ Gur   Tea ~ Salt~ Pota~ Onion Toma~
     6 "Cen~ Rice  Wheat Atta~ Gram~ "Tur~ Urad~ Moon~ Maso~ Sugar Milk~ Grou~ Must~ Vana~ Soya~ Sunf~ Palm~ Gur   Tea ~ Salt~ Pota~ Onion Toma~ NA   
     7 "Ave~ 38.15 31.15 36.07 74.28 "111~ 108.~ 103.~ 97.12 42.57 53.68 187.~ 170.4 152.~ 152.~ 173.~ 126.~ 51.41 278.~ 20.74 28.36 26.23 41.81 NA   
     8 "Max~ 61    45    65    120   "142" 175   145   134   55    80    250   255   227   218   238   188   120   547   30    53    60    73    NA   
     9 "Min~ 22    18    24    57    "75"  73    80    77    36    32    155   140   103   125   141   98    35    123   8     13    12    17    NA   
    10 "Mod~ 40    25    35    70    "110" 100   97.5  90    40    48    186.~ 162   160   150   190   137   46    240   25    25    28    40    NA   
    # ... with 266 more rows, and 6,153 more variables: X25 <chr>, X26 <chr>, X27 <chr>, X28 <chr>, X29 <chr>, X30 <chr>, X31 <dbl>, X32 <dbl>,
    #   X33 <dbl>, X34 <dbl>, X35 <dbl>, X36 <dbl>, X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>, X41 <dbl>, X42 <dbl>, X43 <dbl>, X44 <dbl>,
    #   X45 <dbl>, X46 <dbl>, X47 <dbl>, X48 <chr>, X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, X53 <chr>, X54 <int>, X55 <int>, X56 <int>,
    #   X57 <int>, X58 <int>, X59 <int>, X60 <int>, X61 <int>, X62 <int>, X63 <int>, X64 <int>, X65 <int>, X66 <int>, X67 <int>, X68 <int>,
    #   X69 <int>, X70 <int>, X71 <chr>, X72 <int>, X73 <int>, X74 <int>, X75 <int>, X76 <chr>, X77 <int>, X78 <int>, X79 <int>, X80 <int>,
    #   X81 <int>, X82 <int>, X83 <int>, X84 <int>, X85 <int>, X86 <int>, X87 <int>, X88 <int>, X89 <int>, X90 <int>, X91 <int>, X92 <int>,
    #   X93 <int>, X94 <chr>, X95 <int>, X96 <int>, X97 <int>, X98 <int>, X99 <chr>, X100 <int>, X101 <dbl>, X102 <int>, X103 <int>, X104 <int>, ...
    # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
    
    [[2]]
    # A tibble: 274 x 6,165
       X1    X2    X3    X4    X5    X6    X7    X8    X9    X10   X11   X12   X13   X14   X15   X16   X17   X18   X19   X20   X21   X22   X23   X24  
       <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
     1 "Dep~ "Dat~ "Dai~ "Uni~  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA   
     2 "Dat~ "Dai~ "Uni~  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA   NA   
     3 "Cen~ "Cen~ "Ric~ "Whe~ "Att~ "Gra~ "Tur~ "Ura~ "Moo~ "Mas~ "Sug~ "Mil~ "Gro~ "Mus~ "Van~ "Soy~ "Sun~ "Pal~ "Gur" "Tea~ "Sal~ "Pot~ "Oni~ Toma~
     4 "Cen~ "Ric~ "Whe~ "Att~ "Gra~ "Tur~ "Ura~ "Moo~ "Mas~ "Sug~ "Mil~ "Gro~ "Mus~ "Van~ "Soy~ "Sun~ "Pal~ "Gur" "Tea~ "Sal~ "Pot~ "Oni~ "Tom~ NA   
     5 "Ave~ "38.~ "31.~ "36.~ "74.~ "111~ "108~ "103~ "97.~ "42.~ "53.~ "187~ "170~ "152~ "152~ "173~ "126~ "51.~ "278~ "20.~ "28.~ "26.~ "41.~ NA   
     6 "Max~ "61"  "45"  "65"  "120" "142" "175" "145" "134" "55"  "80"  "250" "255" "227" "218" "238" "188" "120" "547" "30"  "53"  "60"  "73"  NA   
     7 "Min~ "22"  "18"  "24"  "57"  "75"  "73"  "80"  "77"  "36"  "32"  "155" "140" "103" "125" "141" "98"  "35"  "123" "8"   "13"  "12"  "17"  NA   
     8 "Mod~ "40"  "25"  "35"  "70"  "110" "100" "97.~ "90"  "40"  "48"  "186~ "162" "160" "150" "190" "137" "46"  "240" "25"  "25"  "28"  "40"  NA   
     9 "NOR~ ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    ""    NA   
    10 "CHA~ "38"  "24"  "28"  "78"  "113" "111" "112" "90"  "42"  "57"  "177" "177" "175" "168" "170" "145" "48"  "229" "23"  "26"  "28"  "48"  NA   
    # ... with 264 more rows, and 6,141 more variables: X25 <chr>, X26 <dbl>, X27 <dbl>, X28 <dbl>, X29 <dbl>, X30 <dbl>, X31 <dbl>, X32 <dbl>,
    #   X33 <dbl>, X34 <dbl>, X35 <dbl>, X36 <dbl>, X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>, X41 <dbl>, X42 <dbl>, X43 <dbl>, X44 <dbl>,
    #   X45 <dbl>, X46 <dbl>, X47 <dbl>, X48 <chr>, X49 <int>, X50 <int>, X51 <int>, X52 <int>, X53 <int>, X54 <int>, X55 <int>, X56 <int>,
    #   X57 <int>, X58 <int>, X59 <int>, X60 <int>, X61 <int>, X62 <int>, X63 <int>, X64 <int>, X65 <int>, X66 <int>, X67 <int>, X68 <int>,
    #   X69 <int>, X70 <int>, X71 <chr>, X72 <int>, X73 <int>, X74 <int>, X75 <int>, X76 <int>, X77 <int>, X78 <int>, X79 <int>, X80 <int>,
    #   X81 <int>, X82 <int>, X83 <int>, X84 <int>, X85 <int>, X86 <int>, X87 <int>, X88 <int>, X89 <int>, X90 <int>, X91 <int>, X92 <int>,
    #   X93 <int>, X94 <chr>, X95 <int>, X96 <int>, X97 <int>, X98 <int>, X99 <int>, X100 <int>, X101 <dbl>, X102 <int>, X103 <int>, X104 <int>, ...
    # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
    
    [[3]]
    # A tibble: 1 x 3
      X1               X2                                           X3           
      <chr>            <chr>                                        <chr>        
    1 Date  14/09/2022 Daily Retail Prices Of Essential Commodities Unit: (?/Kg.)
    
    [[4]]
    # A tibble: 267 x 23
       Centre  Rice  Wheat Atta ~1 Gram ~2 Tur/A~3 Urad ~4 Moong~5 Masoo~6 Sugar Milk ~7 Groun~8 Musta~9 Vanas~* Soya ~* Sunfl~* Palm ~* Gur   Tea L~*
       <chr>   <chr> <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr> <chr>  
     1 Averag~ "38.~ "31.~ "36.07" "74.28" "111.5~ "108.2~ "103.0~ "97.12" "42.~ "53.68" "187.7~ "170.4" "152.1~ "152.8~ "173.3~ "126.5~ "51.~ "278.4~
     2 Maximu~ "61"  "45"  "65"    "120"   "142"   "175"   "145"   "134"   "55"  "80"    "250"   "255"   "227"   "218"   "238"   "188"   "120" "547"  
     3 Minimu~ "22"  "18"  "24"    "57"    "75"    "73"    "80"    "77"    "36"  "32"    "155"   "140"   "103"   "125"   "141"   "98"    "35"  "123"  
     4 Modal ~ "40"  "25"  "35"    "70"    "110"   "100"   "97.5"  "90"    "40"  "48"    "186.6~ "162"   "160"   "150"   "190"   "137"   "46"  "240"  
     5 NORTH ~ ""    ""    ""      ""      ""      ""      ""      ""      ""    ""      ""      ""      ""      ""      ""      ""      ""    ""     
     6 CHANDI~ "38"  "24"  "28"    "78"    "113"   "111"   "112"   "90"    "42"  "57"    "177"   "177"   "175"   "168"   "170"   "145"   "48"  "229"  
     7 DELHI   "32"  "25"  "30"    "72"    "115"   "127"   "100"   "93"    "41"  "52"    "203"   "179"   "154"   "154"   "207"   "126"   "58"  "224"  
     8 HISAR   "30"  "24"  "27"    "60"    "105"   "90"    "90"    "90"    "40"  "56"    "185"   "145"   "135"   "145"   "185"   "132"   "42"  "240"  
     9 KARNAL  "37"  "26"  "32"    "76"    "105"   "106"   "106"   "104"   "42"  "62"    "176"   "171"   "156"   "161"   "165"   "140"   "45"  "278"  
    10 PANCHK~ "29"  "24"  "29"    "66"    "108"   "92"    "96"    "85"    "43"  "55"    "213"   "180"   "185"   "164"   "197"   "131"   "45"  "255"  
    # ... with 257 more rows, 4 more variables: `Salt Pack (Iodised)` <chr>, Potato <chr>, Onion <chr>, Tomato <chr>, and abbreviated variable names
    #   1: `Atta (Wheat)`, 2: `Gram Dal`, 3: `Tur/Arhar Dal`, 4: `Urad Dal`, 5: `Moong Dal`, 6: `Masoor Dal`, 7: `Milk @`,
    #   8: `Groundnut Oil (Packed)`, 9: `Mustard Oil (Packed)`, *: `Vanaspati (Packed)`, *: `Soya Oil (Packed)`, *: `Sunflower Oil (Packed)`,
    #   *: `Palm Oil (Packed)`, *: `Tea Loose`
    # i Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
    
    [[5]]
    # A tibble: 2 x 4
      X1                                               X2                                                                                  X3    X4   
      <chr>                                            <chr>                                                                               <chr> <lgl>
    1 Source:-  State civil supplies Dept.             Note:- All India Average Prices of Oils are on Weightage (Consumption) basis w.e.f~ NR -~ NA   
    2 Developed by:-  National Informatics Centre(NIC) Developed by:-  National Informatics Centre(NIC)                                    NA    NA