javajava-8uri

Resolving relative URIs in Java 8 while dealing with JDK bug


I'm in the scenario where I am stuck on Java 8. I have a program that creates and resolves URIs using java.net.URI. These are generally always going to be http scheme, but we will potentially get others that we need to handle.

import java.net.URI;

class Scratch {
    public static void main(String[] args) throws Exception {
        URI base = new URI("https://example.com/");
        URI other = new URI("path/to/resource?query=hello");
        URI result = base.resolve(other);
        System.out.println(result);
    }
}

This resolves to the correct URI, as expected:

https://example.com/path/to/resource?query=hello

However, due to JDK-8272702, this behaves rather unexpectedly when resolving relative URIs in various other ways.

Examples

Here are a handful of examples of different relative resolutions and their expectation:

import java.net.URI;

class Scratch {
    private static final String F = "%-20s %-15s %-35s %-35s%n";
    public static void main(String[] args) throws Exception {
        System.out.printf(F, "Base", "Resolve part", "Expected", "Actual (if different)");
        test("https://a.com", ".", "https://a.com/");
        test("https://a.com", "./", "https://a.com/");
        test("https://a.com", "./path", "https://a.com/path");
        test("https://a.com", "path", "https://a.com/path");
        test("https://a.com", "path/", "https://a.com/path/");
        test("https://a.com", "./path/", "https://a.com/path/");
        test("https://a.com", "../", "https://a.com/../");
        test("https://a.com", "../path", "https://a.com/../path");
        test("https://a.com", "../path/", "https://a.com/../path/");

        System.out.println("\nTrailing slash");
        test("https://a.com/", ".", "https://a.com/");
        test("https://a.com/", "./", "https://a.com/");
        test("https://a.com/", "./path", "https://a.com/path");
        test("https://a.com/", "path", "https://a.com/path");
        test("https://a.com/", "path/", "https://a.com/path/");
        test("https://a.com/", "./path/", "https://a.com/path/");
        test("https://a.com/", "../", "https://a.com/../");
        test("https://a.com/", "../path", "https://a.com/../path");
        test("https://a.com/", "../path/", "https://a.com/../path/");
    }

    private static void test(String base, String resolve, String expected) throws Exception {
        URI baseUri = new URI(base);
        URI resolveUri = new URI(resolve);
        URI actual = baseUri.resolve(resolveUri);
        URI expectedUri = new URI(expected);
        String difference = actual.equals(expectedUri) ? "" : actual.toString();
        System.out.printf(F, baseUri, resolveUri, expectedUri, difference);
    }
}

On Java 21, where the linked bug is "fixed" and assuming the output of each resolution is correct, we get the following output:

Base                 Resolve part    Expected                            Actual (if different)              
https://a.com        .               https://a.com/                                                         
https://a.com        ./              https://a.com/                                                         
https://a.com        ./path          https://a.com/path                                                     
https://a.com        path            https://a.com/path                                                     
https://a.com        path/           https://a.com/path/                                                    
https://a.com        ./path/         https://a.com/path/                                                    
https://a.com        ../             https://a.com/../                                                      
https://a.com        ../path         https://a.com/../path                                                  
https://a.com        ../path/        https://a.com/../path/                                                 

Trailing slash
https://a.com/       .               https://a.com/                                                         
https://a.com/       ./              https://a.com/                                                         
https://a.com/       ./path          https://a.com/path                                                     
https://a.com/       path            https://a.com/path                                                     
https://a.com/       path/           https://a.com/path/                                                    
https://a.com/       ./path/         https://a.com/path/                                                    
https://a.com/       ../             https://a.com/../                                                      
https://a.com/       ../path         https://a.com/../path                                                  
https://a.com/       ../path/        https://a.com/../path/

Java 8 invocation

However, when ran on Java 8 (1.8.0_322):

Base                 Resolve part    Expected                            Actual (if different)              
https://a.com        .               https://a.com/                      https://a.com                      
https://a.com        ./              https://a.com/                      https://a.com                      
https://a.com        ./path          https://a.com/path                  https://a.compath                  
https://a.com        path            https://a.com/path                  https://a.compath                  
https://a.com        path/           https://a.com/path/                 https://a.compath/                 
https://a.com        ./path/         https://a.com/path/                 https://a.compath/                 
https://a.com        ../             https://a.com/../                   https://a.com../                   
https://a.com        ../path         https://a.com/../path               https://a.com../path               
https://a.com        ../path/        https://a.com/../path/              https://a.com../path/              

Trailing slash
https://a.com/       .               https://a.com/                                                         
https://a.com/       ./              https://a.com/                                                         
https://a.com/       ./path          https://a.com/path                                                     
https://a.com/       path            https://a.com/path                                                     
https://a.com/       path/           https://a.com/path/                                                    
https://a.com/       ./path/         https://a.com/path/                                                    
https://a.com/       ../             https://a.com/../                                                      
https://a.com/       ../path         https://a.com/../path                                                  
https://a.com/       ../path/        https://a.com/../path/

You can see the Actual are much different for many of the resolutions. This is also not a complete set of potential inputs, just some to demonstrate the problem.

What does Python do?

As an aside, running the same thing in Python (3.11 here) using urllib.parse produces different output

from urllib.parse import urljoin

FORMAT = "{:20s} {:15s} {:35s} {:35s}"

def main():
    print(FORMAT.format("Base", "Resolve part", "Expected", "Actual (if different)"))
    test("https://a.com", ".", "https://a.com/")
    test("https://a.com", "./", "https://a.com/")
    test("https://a.com", "./path", "https://a.com/path")
    test("https://a.com", "path", "https://a.com/path")
    test("https://a.com", "path/", "https://a.com/path/")
    test("https://a.com", "./path/", "https://a.com/path/")
    test("https://a.com", "../", "https://a.com/../")
    test("https://a.com", "../path", "https://a.com/../path")
    test("https://a.com", "../path/", "https://a.com/../path/")

    print("\nTrailing slash")
    test("https://a.com/", ".", "https://a.com/")
    test("https://a.com/", "./", "https://a.com/")
    test("https://a.com/", "./path", "https://a.com/path")
    test("https://a.com/", "path", "https://a.com/path")
    test("https://a.com/", "path/", "https://a.com/path/")
    test("https://a.com/", "./path/", "https://a.com/path/")
    test("https://a.com/", "../", "https://a.com/../")
    test("https://a.com/", "../path", "https://a.com/../path")
    test("https://a.com/", "../path/", "https://a.com/../path/")

def test(base, resolve, expected):
    base_uri = urljoin(base, "")  # Ensure base is a proper URL
    resolve_uri = urljoin("", resolve)  # Resolve treats empty string as base
    actual = urljoin(base_uri, resolve_uri)
    difference = "" if actual == expected else actual
    print(FORMAT.format(base_uri, resolve_uri, expected, difference))

main()

Output:

Base                 Resolve part    Expected                            Actual (if different)              
https://a.com        .               https://a.com/                                                         
https://a.com        ./              https://a.com/                                                         
https://a.com        ./path          https://a.com/path                                                     
https://a.com        path            https://a.com/path                                                     
https://a.com        path/           https://a.com/path/                                                    
https://a.com        ./path/         https://a.com/path/                                                    
https://a.com        ../             https://a.com/../                   https://a.com/                     
https://a.com        ../path         https://a.com/../path               https://a.com/path                 
https://a.com        ../path/        https://a.com/../path/              https://a.com/path/                

Trailing slash
https://a.com/       .               https://a.com/                                                         
https://a.com/       ./              https://a.com/                                                         
https://a.com/       ./path          https://a.com/path                                                     
https://a.com/       path            https://a.com/path                                                     
https://a.com/       path/           https://a.com/path/                                                    
https://a.com/       ./path/         https://a.com/path/                                                    
https://a.com/       ../             https://a.com/../                   https://a.com/                     
https://a.com/       ../path         https://a.com/../path               https://a.com/path                 
https://a.com/       ../path/        https://a.com/../path/              https://a.com/path/ 

Produces slightly different output. This could be do to an RFC difference between Java and Python or a slight difference in how the normalization works, because the only differences are with the double dot segments.

Question

How can I author a Java 8 safe resolve method that works for all java.net.URIs in spite of the above mentioned bug?


Solution

  • This is the best approach I've landed on given the link to the JDK source code.

        private static URI safeResolve(URI baseUri, URI otherUri) throws Exception {
            if (
                !baseUri.isOpaque() &&
                !otherUri.isOpaque() &&
                (baseUri.getPath() == null || baseUri.getPath().isEmpty()) &&
                otherUri.getPath() != null &&
                !otherUri.getPath().isEmpty() &&
                !otherUri.getPath().startsWith("/")) {
                    return new URI(
                        baseUri.getScheme(),
                        baseUri.getAuthority(),
                        "/" + otherUri.getPath(),
                        otherUri.getQuery(),
                        otherUri.getFragment()
                    ).normalize();
            } else {
                // all other cases use built-in resolution
                return baseUri.resolve(otherUri);
            }
        }
    

    And updating the test method slightly to use our new approach:

    
            // URI actual = baseUri.resolve(resolveUri);
            URI actual = safeResolve(baseUri, resolveUri);
    

    We see the following output:

    Base                 Resolve part    Expected                            Actual (if different)              
    https://a.com        .               https://a.com/                                                         
    https://a.com        ./              https://a.com/                                                         
    https://a.com        ./path          https://a.com/path                                                     
    https://a.com        path            https://a.com/path                                                     
    https://a.com        path/           https://a.com/path/                                                    
    https://a.com        ./path/         https://a.com/path/                                                    
    https://a.com        ../             https://a.com/../                                                      
    https://a.com        ../path         https://a.com/../path                                                  
    https://a.com        ../path/        https://a.com/../path/                                                 
    
    Trailing slash
    https://a.com/       .               https://a.com/                                                         
    https://a.com/       ./              https://a.com/                                                         
    https://a.com/       ./path          https://a.com/path                                                     
    https://a.com/       path            https://a.com/path                                                     
    https://a.com/       path/           https://a.com/path/                                                    
    https://a.com/       ./path/         https://a.com/path/                                                    
    https://a.com/       ../             https://a.com/../                                                      
    https://a.com/       ../path         https://a.com/../path                                                  
    https://a.com/       ../path/        https://a.com/../path/                                                 
    

    Although we've seen some edges with constructing using new URI and not totally matching all expected outputs, this has worked well enough across our code base.