javaurlurl-parsingrfc3986

Java URL Class getPath(), getQuery() and getFile() inconsistent with RFC3986 URI Syntax


I am writing a utility class that semi-wraps Java's URL class, and I have written a bunch of test cases to verify the methods I have wrapped with a customized implementation. I don't understand the output of some of Java's getters for certain URL strings.

According to the RFC 3986 specification, a path component is defined as follows:

The path is terminated by the first question mark ("?") or number sign   
("#") character, or by the end of the URI.

A query component is defined as follows:

The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

I have a couple test cases which are treated by Java as valid URLs, but getters for path, file and query don't return the values I had expected:

URL url = new URL("https://www.somesite.com/?param1=val1");

System.out.print(url.getPath());
System.out.println(url.getFile());
System.out.println(url.getQuery());

The above results in the following output:

//?param1=val1
param1=val1
<empty string>

My other test case:

URL url = new URL("https://www.somesite.com?param1=val1");

System.out.print(url.getPath());
System.out.println(url.getFile());
System.out.println(url.getQuery());

The above results in the following output:

?param1=val1
param1=val1
<empty string>

According to the documentation for Java URL:

public String getFile()

Gets the file name of this URL. The returned file portion will be the  
same as getPath(), plus the concatenation of the value of getQuery(), if 
any. If there is no query portion, this method and getPath() will return 
identical results.

Returns:
    the file name of this URL, or an empty string if one does not exist

So, my test cases result in empty string when getQuery() is invoked. In which case, I would expected getFile() to return the same value as getPath(). This is not the case.

I had expected the following output for both test cases:

<empty string>
?param1=val1
param1=val1

Maybe my interpretation of the RFC 3986 is not correct. But the output I have seen also does not line up with the documentation for the URL class either? Can anyone explain what I am seeing?


Solution

  • Here some executable code based on your fragments:

    import java.net.MalformedURLException;
    import java.net.URL;
    
    public class URLExample {
      public static void main(String[] args) throws MalformedURLException {
        printURLInformation(new URL("https://www.somesite.com/?param1=val1"));
        printURLInformation(new URL("https://www.somesite.com?param1=val1"));
      }
    
      private static void printURLInformation(URL url) {
        System.out.println(url);
        System.out.println("Path:\t" + url.getPath());
        System.out.println("File:\t" + url.getFile());
        System.out.println("Query:\t" + url.getQuery() + "\n");
      }
    
    }
    

    Works fine, here is the result as you might have expected. The only difference is, that you used one System.out.print, followed by System.out.println that printed the result for path and file in the same line.

    https://www.somesite.com/?param1=val1
    Path:   /
    File:   /?param1=val1
    Query:  param1=val1
    
    https://www.somesite.com?param1=val1
    Path:   
    File:   ?param1=val1
    Query:  param1=val1