rweb-scrapinghttr

httr package POST function times out when using the same headers as my browser


I am trying to download a file off of a web site using the httr package in R. This is the web site:

https://www.franklintempleton.com/investments/options/exchange-traded-funds/products/91616/SINGLCLASS/clearbridge-all-cap-growth-esg-etf/CACG

There is a "Download Holdings (XLS)" button under the Portfolio Holdings section that downloads the file in your web browser.

Using the developer mode on my browser I found that the data is retrieved via a POST command. I copied all of the headers and payload information exactly as my browser has them and created this POST command using the httr package:

dta <- POST("https://www.franklintempleton.com/api/pds/price-and-performance?apikey=4ef35821-5244-41bc-a699-0192d002c3d1p&op=Holdings&id=14",
            body='{"operationName":"Holdings","variables":{"countrycode":"US","languagecode":"en_US","fundid":"91616"},"query":"query Holdings($fundid: String!, $countrycode: String!, $languagecode: String!) {\n  Portfolio(\n    fundid: $fundid\n    countrycode: $countrycode\n    languagecode: $languagecode\n  ) {\n    fundname\n    producttype\n    assetclass\n    portfolio {\n      topholdings {\n        asofdate\n        hldngname\n        geocode\n        sectorname\n        brkdwnpct\n        frequency\n        calcbasislocal\n        calctypelocal\n        allocflag\n        hasderivatives\n      }\n      dailyholdings {\n        asofdate\n        frequency\n        secticker\n        isinsecnbr\n        cusipnbr\n        secname\n        quantityshrpar\n        origcouponrate\n        sctrname\n        pctofnetassets\n        mktvalue\n        notionalmktvalue\n        secexpdate\n        assetclasscatg\n        mktcurr\n        contracts\n      }\n      fullholdings {\n        asofdate\n        asofdatestd\n        frequency\n        secticker\n        isinsecnbr\n        cusipnbr\n        secname\n        quantityshrpar\n        origcouponrate\n        sctrname\n        pctofnetassets\n        mktvalue\n        notionalmktvalue\n        secexpdate\n        assetclasscatg\n        mktcurr\n        contracts\n        finalmaturitydate\n        investmentcategory\n      }\n    }\n  }\n}\n"}',
            add_headers(.headers=c(
              'Host'='www.franklintempleton.com',
              'User-Agent'='Mozilla/5.0 (Windows NT 10.0; rv:121.0) Gecko/20100101 Firefox/121.0',
              'Accept'='application/json',
              'Accept-Language'='en-US,en;q=0.5',
              'Accept-Encoding'='gzip, deflate, br',
              'Content-Type'='application/json',
              'Content-Length'=1401,
              'Origin'='https://www.franklintempleton.com',
              'DNT'=1,
              'Connection'='keep-alive',
              'Referer'='https://www.franklintempleton.com/investments/options/exchange-traded-funds/products/91616/SINGLCLASS/clearbridge-all-cap-growth-esg-etf/CACG',
              'Sec-Fetch-Dest'='empty',
              'Sec-Fetch-Mode'='cors',
              'Sec-Fetch-Site'='same-origin',
              'Sec-GPC'=1,
              'Pragma'='no-cache',
              'Cache-Control'='no-cache',
              'TE'='trailers')))

When I run that command it times out. I think everything with the command is ok because I've done something similar on other web sites in the past. The web site has no problem loading in my browser on the same machine, ip, etc. Even with the same headers, can the web server still detect that this is not a browser and then decide to ignore request? Or is there another reason why this isn't working? I would rather not use something like RSelenium to do this if possible. Is there a way to make this command work using the httr package or something similar? Thank you in advance


Solution

  • You can get a curl command from your browser's dev. tools network tab context menu (Copy > Copy as cURL (bash) in Chrome), which you can feed into httr2::curl_translate() , https://curlconverter.com/r/ or some similar tool to translate it to httr / httr2. The result is not always perfect, but often good enough. Trimmed-down curlconverter.com/r/ result works as expected:

    library(httr)
    
    headers = c(
      `content-type` = "application/json"
    )
    params = list(
      `apikey` = "4ef35821-...",
      `op` = "Holdings",
      `id` = "14"
    )
    data = '{"operationName":"Holdings","variables":{"countrycode":"US","languagecode":"en_US","fundid":"91616"},"query":"query Holdings($fundid: String!, $countrycode: String!, $languagecode: String!) {\\n  Portfolio(\\n    fundid: $fundid\\n    countrycode: $countrycode\\n    languagecode: $languagecode\\n  ) {\\n    fundname\\n    producttype\\n    assetclass\\n    portfolio {\\n      topholdings {\\n        asofdate\\n        hldngname\\n        geocode\\n        sectorname\\n        brkdwnpct\\n        frequency\\n        calcbasislocal\\n        calctypelocal\\n        allocflag\\n        hasderivatives\\n      }\\n      dailyholdings {\\n        asofdate\\n        frequency\\n        secticker\\n        isinsecnbr\\n        cusipnbr\\n        secname\\n        quantityshrpar\\n        origcouponrate\\n        sctrname\\n        pctofnetassets\\n        mktvalue\\n        notionalmktvalue\\n        secexpdate\\n        assetclasscatg\\n        mktcurr\\n        contracts\\n      }\\n      fullholdings {\\n        asofdate\\n        asofdatestd\\n        frequency\\n        secticker\\n        isinsecnbr\\n        cusipnbr\\n        secname\\n        quantityshrpar\\n        origcouponrate\\n        sctrname\\n        pctofnetassets\\n        mktvalue\\n        notionalmktvalue\\n        secexpdate\\n        assetclasscatg\\n        mktcurr\\n        contracts\\n        finalmaturitydate\\n        investmentcategory\\n      }\\n    }\\n  }\\n}\\n"}'
    httr::POST(url = "https://www.franklintempleton.com/api/pds/price-and-performance", 
               httr::add_headers(.headers=headers), query = params, body = data) |>
      content() |>
      purrr::pluck("data", "Portfolio", "portfolio", "dailyholdings") |>
      dplyr::bind_rows() |>
      dplyr::arrange(desc(pctofnetassets))
    #> # A tibble: 70 × 16
    #>    asofdate   frequency secticker isinsecnbr   cusipnbr  secname  quantityshrpar
    #>    <chr>      <chr>     <chr>     <chr>        <chr>     <chr>    <chr>         
    #>  1 03/20/2024 DAILY     AMZN      US0231351067 023135106 AMAZON.… 49,114.00     
    #>  2 03/20/2024 DAILY     NVDA      US67066G1040 67066G104 NVIDIA … 7,818.00      
    #>  3 03/20/2024 DAILY     MSFT      US5949181045 594918104 MICROSO… 15,920.00     
    #>  4 03/20/2024 DAILY     META      US30303M1027 30303M102 META PL… 12,164.00     
    #>  5 03/20/2024 DAILY     AVGO      US11135F1012 11135F101 BROADCO… 3,804.00      
    #>  6 03/20/2024 DAILY     UNH       US91324P1021 91324P102 UNITEDH… 9,458.00      
    #>  7 03/20/2024 DAILY     NFLX      US64110L1061 64110L106 NETFLIX… 6,340.00      
    #>  8 03/20/2024 DAILY     VRTX      US92532F1003 92532F100 VERTEX … 9,132.00      
    #>  9 03/20/2024 DAILY     V         US92826C8394 92826C839 VISA IN… 13,094.00     
    #> 10 03/20/2024 DAILY     CRWD      US22788C1053 22788C105 CROWDST… 10,770.00     
    #> # ℹ 60 more rows
    #> # ℹ 9 more variables: origcouponrate <chr>, sctrname <chr>,
    #> #   pctofnetassets <chr>, mktvalue <chr>, notionalmktvalue <chr>,
    #> #   secexpdate <chr>, assetclasscatg <chr>, mktcurr <chr>, contracts <chr>
    

    Created on 2024-03-21 with reprex v2.1.0

    And just in case, there is no XLSX download, Excel file is constructed from JSON data locally by your browser.