I am working in RStudio (RStudio 2023.03.0+386 "Cherry Blossom" Release) and trying to readLines()
from an http address that I know is correct.
The code is as follows:
con <- url("http://biostat.jhsph.edu/~jleek/contact.html")
htmlCode <- readLines(con)
close(con)
And the error I get is:
Error in readLines(con) :
cannot open the connection to 'https://biostat.jhsph.edu/~jleek/contact.html'
In addition: Warning message:
In readLines(con) :
URL 'https://biostat.jhsph.edu/~jleek/contact.html': status was 'SSL connect error'
Following is the sessionInfo()
output:
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United
States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RMySQL_0.10.25 DBI_1.1.3 sqldf_0.4-11 RSQLite_2.3.1
gsubfn_0.7 proto_1.0.0 httpuv_1.6.9
[8] httr_1.4.5 readr_2.1.4
loaded via a namespace (and not attached):
[1] Rcpp_1.0.10 rstudioapi_0.14 magrittr_2.0.3 hms_1.1.3
bit_4.0.5 R6_2.5.1
[7] rlang_1.1.0 fastmap_1.1.1 fansi_1.0.4 blob_1.2.4
tcltk_4.2.3 tools_4.2.3
[13] utf8_1.2.3 cli_3.6.0 bit64_4.0.5 tibble_3.2.0
lifecycle_1.0.3 tzdb_0.3.0
[19] later_1.3.0 vctrs_0.6.0 promises_1.2.0.1 cachem_1.0.7
memoise_2.0.1 glue_1.6.2
[25] compiler_4.2.3 pillar_1.9.0 chron_2.3-60 pkgconfig_2.0.3
Actually your code works fine for me, but I'm running Linux, so it's hard to say. Perhaps you need to install OpenSSL.
You could try a different method
in url
,
con <- url("https://biostat.jhsph.edu/~jleek/contact.html", method='libcurl')
htmlCode <- readLines(con)
close(con)
head(htmlCode, 5)
# [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
# [2] ""
# [3] "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">"
# [4] ""
# [5] "<head>"
or without url
,
htmlCode <- readLines('https://biostat.jhsph.edu/~jleek/contact.html')
head(htmlCode, 1)
# [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
or, as a workaround, try download the file first and read then (note, that download.file
also has a method
argument.).
tmp <- tempfile()
download.file('https://biostat.jhsph.edu/~jleek/contact.html', tmp)
htmlCode <- readLines(tmp)
unlink(tmp)
head(htmlCode, 1)
# [1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">"
Or, use some packages out there, e.g.
XML::htmlTreeParse(RCurl::getURL('https://biostat.jhsph.edu/~jleek/contact.html'))$children$html
# <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
# <head>
# <meta name="Description" content="Welcome to Jeff Leek's Research Group"/>
# ...
Hope this helps.