rubyuriruby-2.0idnaddressable-gem

display IDNs from normalized URIs with ruby (using the Addressable Gem)


In my Ruby app I need to handle URIs from user input (which are actually IRIs)

str = "http://उदाहरण.परीक्षा/मुख्य_पृष्ठ"

I normalize these using Addressable, and only store the normalized form:

normalized = Addressable::URI.parse(str).normalize
normalized.to_s
#=> http://xn--p1b6ci4b4b3a.xn--11b5bs3a9aj6g/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4%AF_%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0

This is nice to work with, but obviously not nice to display to end users.

For that I'd like to convert this URI back to its original form (non-punycode, non-percent-encoded-path)

Addressable has display_uri, but that only converts the host:

nicer = normalized.display_uri.to_s
#=> http://उदाहरण.परीक्षा/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4%AF_%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0

This looks like it works:

display_s = Addressable::URI.parse(str).display_uri.to_s
pretty = Addressable::URI.unencode(display_s.force_encoding("ASCII-8BIT"))

However, that code looks wrong (I should not need to use force_encoding) and I'm not at all confident that it is correct.

code: https://gist.github.com/levinalex/6115764

tl;dr

how do I convert this:

"http://xn--p1b6ci4b4b3a.xn--11b5bs3a9aj6g/" +
"%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4" +
"%AF_%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0"

to this:

"http://उदाहरण.परीक्षा/मुख्य_पृष्ठ"

Solution

  • You should not need any forced (re-)encoding to recover the original URI. Simply:

    normalised_s = "http://xn--p1b6ci4b4b3a.xn--11b5bs3a9aj6g/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A5%8D%E0%A4%AF_%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0"        
    Addressable::URI.unencode(Addressable::URI.parse(normalised_s).display_uri)
    
    => "http://उदाहरण.परीक्षा/मुख्य_पृष्ठ"
    

    To repeat what Bob said in the comments, normalisation is definitely a good way of guaranteeing uniqueness for storage.