When I open this URL in the browser:
https://processo.stj.jus.br/processo/dj/documento/?=&sequencial=300060606&num_registro=202500087810&data=20250313&data_pesquisa=20250313&componente=MON
It opens as a PDF. But when I try to open it using httr/httr2
, I get an HTML:
url1 <- "https://processo.stj.jus.br/processo/dj/documento/?=&sequencial=300060606&num_registro=202500087810&data=20250313&data_pesquisa=20250313&componente=MON"
response <- httr::GET(url1)
print(response):
Response [https://processo.stj.jus.br/processo/dj/documento/?=&sequencial=300060606&num_registro=202500087810&data=20250313&data_pesquisa=20250313&componente=MON]
Date: 2025-04-01 07:51
Status: 200
Content-Type: text/html; charset=UTF-8
Size: 203 kB
<!doctype html>
<html>
<head>
<title></title>
<style>
html, body {
margin: 0;
padding: 0;
background-color: white;
}
Can someone help me figure out how to get the PDF?
If you have DevTools active in your browser session, you'll see in the network tab that the first response includes a weird JavaScript challenge that triggers that same request again, now with additional headers & cookies. PDF content is in that 2nd response.
There's a good chance that it does trigger something at server side and this is only reproducible in a short time window, but for now it seems that we can completely ignore all that JavaScript, cookies and most extra headers, we only need to make sure istl-infinite-loop
is set:
library(httr2)
url_ <- "https://processo.stj.jus.br/processo/dj/documento/?=&sequencial=300060606&num_registro=202500087810&data=20250313&data_pesquisa=20250313&componente=MON"
resp <-
request(url_) |>
req_headers(`istl-infinite-loop` = "1") |>
req_perform()
resp
#> <httr2_response>
#> GET
#> https://processo.stj.jus.br/processo/dj/documento/?=&sequencial=300060606&num_registro=202500087810&data=20250313&data_pesquisa=20250313&componente=MON
#> Status: 200 OK
#> Content-Type: application/pdf
#> Body: In memory (208932 bytes)
# save
filename <-
resp_header(resp, "content-disposition") |>
print() |>
strsplit("=") |>
_[[1]][2]
#> [1] "inline; filename=stj_dje_20250313_0_46045183.pdf"
resp_body_raw(resp) |> writeBin(filename)
# check
pdftools::pdf_info(filename) |> str()
#> List of 11
#> $ version : chr "1.7"
#> $ pages : int 8
#> $ encrypted : logi FALSE
#> $ linearized : logi FALSE
#> $ keys :List of 1
#> ..$ Producer: chr "iText® 7.1.2 ©2000-2018 iText Group NV (AGPL-version)"
#> $ created : POSIXct[1:1], format: "2025-03-11 00:39:40"
#> $ modified : POSIXct[1:1], format: "2025-04-01 12:04:04"
#> $ metadata : chr ""
#> $ locked : logi FALSE
#> $ attachments: logi FALSE
#> $ layout : chr "no_layout"
pdftools::pdf_text(filename)[1] |>
substr(1,350) |>
cat()
#> HABEAS CORPUS Nº 974679 - SP (2025/0008781-0)
#>
#> RELATOR : MINISTRO REYNALDO SOARES DA FONSECA
#> IMPETRANTE : GLAUCIO DALPONTE MATTIOLI
#> ADVOGADO : GLAUCIO DALPONTE MATTIOLI - SP253642
#>
Created on 2025-04-01 with reprex v2.1.1