rubyrdfurl-encodingredland

RDF::Reader > Problems with URI::InvalidError


I have problems with this code:

require 'rubygems'
require 'rdf'
require 'rdf/raptor'

RDF::Reader.open("http://reegle.info/countries/IN.rdf") do |reader|
  reader.each_statement do |statement|
    puts statement.inspect
  end
end

When trying to open the above mentioned url, I get redirected to an url, which URI.parse obviously doesn´t like:

http://sparql.reegle.info?query=CONSTRUCT+{+%3Chttp://reegle.info/countries/IN%3E+?p+?o.+%3Chttp://reegle.info/countries/IN.rdf%3E+foaf:primaryTopic+%3Chttp://reegle.info/countries/IN%3E;+cc:license+%3Chttp://www.nationalarchives.gov.uk/doc/open-government-licence%3E;+cc:attributionName+"REEEP";+cc:attributionURL+%3Chttp://reegle.info/countries/IN%3E.+}+WHERE+{+%3Chttp://reegle.info/countries/IN%3E+?p+?o.}&format=application/rdf%2Bxml

So I get the following error:

URI::InvalidURIError: bad URI(is not URI?)

Any ideas, how to get around this issue?

Thanks

P.S. Doing something like URI.parse(URI.encode([url]))) does not have any effects here.


Solution

  • URI doesn't like the double quotes or braces in that URL. You can fix the URI by hand with something like this:

    # This auto-populating cache isn't necessary but...
    replacements = Hash.new { |h,k| h[k] = URI.encode(k) }
    broken_uri.gsub!(/[{}"]/) { replacements[$&] }
    

    From RFC 1738: Uniform Resource Locators (URL):

    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

    So I'd say that reegle.info should be URL-encoding more things than they are. OTOH, Ruby's URI class could be a little more forgiving (Perl's URI class, for example, will accept that URI as input but it converts the double quote and braces to their percent-encoded form on output).