I’m making requests to the Reddit API. First, I set a subreddit top URL:
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
All of these correctly get the contents:
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
Open3.capture2('/usr/bin/curl', '--user-agent', 'My agent', reddit_url.to_s)[0]
URI.open(reddit_url, 'User-Agent' => 'My agent').read
But then I try it with a URL for a specific post:
reddit_url = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
And both Net::HTTP
and Open3
/curl
fail, getting only empty strings. URI.open
continues to work, as does opening the URL in a web browser.
Why doesn’t the second request work with two of the solutions? And why does it work with URI.open
, when that’s supposed to be “an easy-to-use wrapper for Net::HTTP”? What does it do differently, and how to replicate it with Net::HTTP
an curl
?
Working with your example, and focussing on Net::HTTP for simplicity, the first example doesn't work as written:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
Net::HTTP.get(reddit_url, 'User-Agent' => 'My agent')
# => Type Error - no implicit conversion of URI::HTTPS into String
Instead I used this as my starting point:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007fc3ea8e7320>
puts result.body.size
# => 167,394
With that working we can try the second URL. Interestingly, I get different results depending on whether I re-use the initial connection or make a new one:
require 'net/http'
reddit_url = URI.parse('https://www.reddit.com/r/pixelart/top.json')
reddit_url_two = URI.parse('https://reddit.com/r/PixelArt/comments/lkaiqf/another_watercolour_pixelart_tree.json')
http = Net::HTTP.new(reddit_url.host, reddit_url.port)
http.use_ssl = true
result = http.get(reddit_url.request_uri, 'User-Agent' => 'My agent')
puts result
# => #<Net::HTTPOK:0x00007f931a143390>
puts result.body.size
# => 174,615
http_two = Net::HTTP.new(reddit_url_two.host, reddit_url_two.port)
http_two.use_ssl = true
result_two = http_two.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_two
# => #<Net::HTTPMovedPermanently:0x00007f931a148818>
puts result_two.body.size
# => 0
result_reusing_connection = http.get(reddit_url_two.request_uri, 'User-Agent' => 'My agent')
puts result_reusing_connection
# => #<Net::HTTPOK:0x00007f931a0fb3b0>
puts result_reusing_connection.body.size
# => 141,575
So I suspect you're getting a 301 redirect sometimes and that's causing the confusion. There's another question and answer here for how to follow redirects.