I'm trying to extract the name and prices of the AKI supermarket in Ecuador. There is a page called tipti that gathers products from several supermarkets.
However, it requires login and the page seems to be dynamic.
This is the first part where login is necessary
Upon entering, the different supermarkets appear, I choose AKI.
Finally I try to extract the names and prices of the products.
library(rvest)
library(hayalbaz)
url <- "https://app.tipti.market/Gran%20Aki/Productos%20Ak%C3%AD"
webpage <- puppet$new(url = url)
webpage
<puppet>
Public:
attach_file: function (selector, file)
click: function (selector, set_focus = TRUE, scroll = TRUE, wait_for_selector = TRUE)
clone: function (deep = FALSE)
close: function ()
content: function ()
download_enable: function (path, report = TRUE, progress = report)
focus: function (selector)
get_cookies: function ()
get_element: function (selector, as_xml2 = TRUE)
get_elements: function (selector, as_xml2 = TRUE)
get_js_object: function (name)
get_source: function ()
goto: function (url)
initialize: function (url = NULL, cookies = NULL)
screenshot: function (filename = "screenshot.png", selector = "html", cliprect = NULL,
set_cookies: function (cookies)
set_debug_msgs: function (flag)
set_user_agent: function (user_agent)
set_value: function (selector, value)
type: function (selector = NULL, text)
view: function ()
wait_for_selector: function (selector, timeout = 30, polling = 0.1)
wait_on_load: function ()
Private:
download_path: NULL
download_pb: NULL
get_all_nodes: function (selector)
get_document: function ()
get_node: function (selector, all = FALSE)
get_node_box: function (node_id)
get_node_center: function (node_id)
get_node_html: function (node_id)
key_down: function (key)
key_press: function (key)
key_up: function (key)
mouse_down: function (x, y, button = "left", click_count = 1)
mouse_up: function (x, y, button = "left", click_count = 1)
press: function (key)
session: ChromoteSession, R6
watch_download: function (start = TRUE, report = TRUE, progress = report)
webpage$get_elements(".card-product__name") |> html_text(trim = T)
character(0)
Any idea how to extract the information?
You can look at the fetch requests using webtools (F12). Using this header and the credentials, we can fetch the underlying JSON using GET
for different categories. category_id
"1655" equals 99 cent products for example. Click on the categories on the webpage (left) and observe the fetch requests made to map out the category_ids to "Wonder Woman", "Abarrotes" etc.
library(httr)
# set the category_id=1655 and limit=250 in the request URL
url <- "https://api.tipti.market/misuper/v3/product/recommendations/category_v3/?page=1&retailer_id=276&category_id=1655&limit=250&page_size=250"
headers = add_headers("Accept" = "*/*",
"Accept-Encoding" = "gzip, deflate, br, zstd",
"Accept-Language" = "en-US;q=0.8,en;q=0.7",
"Referer" = "https://www.tipti.market/",
"Origin" = "https://www.tipti.market",
"Authorization" = "JWT eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VyX2lkIjo0OCwiZW1haWwiOiJpbnZpdGFkb0B0aXB0aS5tYXJrZXQiLCJ0eXBlIjoxLCJ1c2VybmFtZSI6Imludml0YWRvQHRpcHRpLm1hcmtldCIsImV4cCI6ODgwODkwMDQzMjR9.GZJXL3HTvI6GNDsPTyvpimmAkWn2ZELeSrJGnBKbP-o",
"User-Agent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36")
response = GET(url, headers)
# check status code (should be 200)
print(response$status_code)
res <- jsonlite::fromJSON(rawToChar(response$content), flatten= TRUE)
ninetyNine_cent_products <- res[["results"]]
giving
> kableExtra::kable(head(ninetyNine_cent_products[,c("item.price", "item.product.name")]))
item.price | item.product.name |
---|---|
0.99 | Tomate Cherry Funda La original 0,99 Ctvs. |
0.99 | Limón Meyer Funda Frutos De Mi Tierra 0,99 Ctvs. |
0.99 | Naranja Malla Divino Niño 0,99 Ctvs. |
0.99 | Limón Malla La Original 0,99 Ctvs. |
0.99 | Ajo Pelado Akí 0,99 Ctvs. |
0.99 | Tomate Cherry La original 0,99 Ctvs. |