I'm in the scenario where I am stuck on Java 8. I have a program that creates and resolves URIs using java.net.URI
. These are generally always going to be http
scheme, but we will potentially get others that we need to handle.
import java.net.URI;
class Scratch {
public static void main(String[] args) throws Exception {
URI base = new URI("https://example.com/");
URI other = new URI("path/to/resource?query=hello");
URI result = base.resolve(other);
System.out.println(result);
}
}
This resolves to the correct URI, as expected:
However, due to JDK-8272702, this behaves rather unexpectedly when resolving relative URIs in various other ways.
Here are a handful of examples of different relative resolutions and their expectation:
import java.net.URI;
class Scratch {
private static final String F = "%-20s %-15s %-35s %-35s%n";
public static void main(String[] args) throws Exception {
System.out.printf(F, "Base", "Resolve part", "Expected", "Actual (if different)");
test("https://a.com", ".", "https://a.com/");
test("https://a.com", "./", "https://a.com/");
test("https://a.com", "./path", "https://a.com/path");
test("https://a.com", "path", "https://a.com/path");
test("https://a.com", "path/", "https://a.com/path/");
test("https://a.com", "./path/", "https://a.com/path/");
test("https://a.com", "../", "https://a.com/../");
test("https://a.com", "../path", "https://a.com/../path");
test("https://a.com", "../path/", "https://a.com/../path/");
System.out.println("\nTrailing slash");
test("https://a.com/", ".", "https://a.com/");
test("https://a.com/", "./", "https://a.com/");
test("https://a.com/", "./path", "https://a.com/path");
test("https://a.com/", "path", "https://a.com/path");
test("https://a.com/", "path/", "https://a.com/path/");
test("https://a.com/", "./path/", "https://a.com/path/");
test("https://a.com/", "../", "https://a.com/../");
test("https://a.com/", "../path", "https://a.com/../path");
test("https://a.com/", "../path/", "https://a.com/../path/");
}
private static void test(String base, String resolve, String expected) throws Exception {
URI baseUri = new URI(base);
URI resolveUri = new URI(resolve);
URI actual = baseUri.resolve(resolveUri);
URI expectedUri = new URI(expected);
String difference = actual.equals(expectedUri) ? "" : actual.toString();
System.out.printf(F, baseUri, resolveUri, expectedUri, difference);
}
}
On Java 21, where the linked bug is "fixed" and assuming the output of each resolution is correct, we get the following output:
Base Resolve part Expected Actual (if different)
https://a.com . https://a.com/
https://a.com ./ https://a.com/
https://a.com ./path https://a.com/path
https://a.com path https://a.com/path
https://a.com path/ https://a.com/path/
https://a.com ./path/ https://a.com/path/
https://a.com ../ https://a.com/../
https://a.com ../path https://a.com/../path
https://a.com ../path/ https://a.com/../path/
Trailing slash
https://a.com/ . https://a.com/
https://a.com/ ./ https://a.com/
https://a.com/ ./path https://a.com/path
https://a.com/ path https://a.com/path
https://a.com/ path/ https://a.com/path/
https://a.com/ ./path/ https://a.com/path/
https://a.com/ ../ https://a.com/../
https://a.com/ ../path https://a.com/../path
https://a.com/ ../path/ https://a.com/../path/
However, when ran on Java 8 (1.8.0_322):
Base Resolve part Expected Actual (if different)
https://a.com . https://a.com/ https://a.com
https://a.com ./ https://a.com/ https://a.com
https://a.com ./path https://a.com/path https://a.compath
https://a.com path https://a.com/path https://a.compath
https://a.com path/ https://a.com/path/ https://a.compath/
https://a.com ./path/ https://a.com/path/ https://a.compath/
https://a.com ../ https://a.com/../ https://a.com../
https://a.com ../path https://a.com/../path https://a.com../path
https://a.com ../path/ https://a.com/../path/ https://a.com../path/
Trailing slash
https://a.com/ . https://a.com/
https://a.com/ ./ https://a.com/
https://a.com/ ./path https://a.com/path
https://a.com/ path https://a.com/path
https://a.com/ path/ https://a.com/path/
https://a.com/ ./path/ https://a.com/path/
https://a.com/ ../ https://a.com/../
https://a.com/ ../path https://a.com/../path
https://a.com/ ../path/ https://a.com/../path/
You can see the Actual are much different for many of the resolutions. This is also not a complete set of potential inputs, just some to demonstrate the problem.
As an aside, running the same thing in Python (3.11 here) using urllib.parse
produces different output
from urllib.parse import urljoin
FORMAT = "{:20s} {:15s} {:35s} {:35s}"
def main():
print(FORMAT.format("Base", "Resolve part", "Expected", "Actual (if different)"))
test("https://a.com", ".", "https://a.com/")
test("https://a.com", "./", "https://a.com/")
test("https://a.com", "./path", "https://a.com/path")
test("https://a.com", "path", "https://a.com/path")
test("https://a.com", "path/", "https://a.com/path/")
test("https://a.com", "./path/", "https://a.com/path/")
test("https://a.com", "../", "https://a.com/../")
test("https://a.com", "../path", "https://a.com/../path")
test("https://a.com", "../path/", "https://a.com/../path/")
print("\nTrailing slash")
test("https://a.com/", ".", "https://a.com/")
test("https://a.com/", "./", "https://a.com/")
test("https://a.com/", "./path", "https://a.com/path")
test("https://a.com/", "path", "https://a.com/path")
test("https://a.com/", "path/", "https://a.com/path/")
test("https://a.com/", "./path/", "https://a.com/path/")
test("https://a.com/", "../", "https://a.com/../")
test("https://a.com/", "../path", "https://a.com/../path")
test("https://a.com/", "../path/", "https://a.com/../path/")
def test(base, resolve, expected):
base_uri = urljoin(base, "") # Ensure base is a proper URL
resolve_uri = urljoin("", resolve) # Resolve treats empty string as base
actual = urljoin(base_uri, resolve_uri)
difference = "" if actual == expected else actual
print(FORMAT.format(base_uri, resolve_uri, expected, difference))
main()
Output:
Base Resolve part Expected Actual (if different)
https://a.com . https://a.com/
https://a.com ./ https://a.com/
https://a.com ./path https://a.com/path
https://a.com path https://a.com/path
https://a.com path/ https://a.com/path/
https://a.com ./path/ https://a.com/path/
https://a.com ../ https://a.com/../ https://a.com/
https://a.com ../path https://a.com/../path https://a.com/path
https://a.com ../path/ https://a.com/../path/ https://a.com/path/
Trailing slash
https://a.com/ . https://a.com/
https://a.com/ ./ https://a.com/
https://a.com/ ./path https://a.com/path
https://a.com/ path https://a.com/path
https://a.com/ path/ https://a.com/path/
https://a.com/ ./path/ https://a.com/path/
https://a.com/ ../ https://a.com/../ https://a.com/
https://a.com/ ../path https://a.com/../path https://a.com/path
https://a.com/ ../path/ https://a.com/../path/ https://a.com/path/
Produces slightly different output. This could be do to an RFC difference between Java and Python or a slight difference in how the normalization works, because the only differences are with the double dot segments.
How can I author a Java 8 safe resolve method that works for all java.net.URI
s in spite of the above mentioned bug?
This is the best approach I've landed on given the link to the JDK source code.
private static URI safeResolve(URI baseUri, URI otherUri) throws Exception {
if (
!baseUri.isOpaque() &&
!otherUri.isOpaque() &&
(baseUri.getPath() == null || baseUri.getPath().isEmpty()) &&
otherUri.getPath() != null &&
!otherUri.getPath().isEmpty() &&
!otherUri.getPath().startsWith("/")) {
return new URI(
baseUri.getScheme(),
baseUri.getAuthority(),
"/" + otherUri.getPath(),
otherUri.getQuery(),
otherUri.getFragment()
).normalize();
} else {
// all other cases use built-in resolution
return baseUri.resolve(otherUri);
}
}
And updating the test method slightly to use our new approach:
// URI actual = baseUri.resolve(resolveUri);
URI actual = safeResolve(baseUri, resolveUri);
We see the following output:
Base Resolve part Expected Actual (if different)
https://a.com . https://a.com/
https://a.com ./ https://a.com/
https://a.com ./path https://a.com/path
https://a.com path https://a.com/path
https://a.com path/ https://a.com/path/
https://a.com ./path/ https://a.com/path/
https://a.com ../ https://a.com/../
https://a.com ../path https://a.com/../path
https://a.com ../path/ https://a.com/../path/
Trailing slash
https://a.com/ . https://a.com/
https://a.com/ ./ https://a.com/
https://a.com/ ./path https://a.com/path
https://a.com/ path https://a.com/path
https://a.com/ path/ https://a.com/path/
https://a.com/ ./path/ https://a.com/path/
https://a.com/ ../ https://a.com/../
https://a.com/ ../path https://a.com/../path
https://a.com/ ../path/ https://a.com/../path/
Although we've seen some edges with constructing using new URI
and not totally matching all expected outputs, this has worked well enough across our code base.