javahtmljsoupsanitizationhtml-sanitizing

How do I prevent Jsoup from removing 'href' attribute of anchor element?


I want to use Jsoup to cleanse input while still allowing anchor elements with an "href" attribute to remain untouched; however, I've found that no matter what I do, Jsoup.clean() removes the "href" attribute. Test code follows:

    public static void main(String[] args)
    {
        final String foo = "<a href='/foo/'>Foo</a>";
        final String cleansedOutput = Jsoup.clean(foo, Safelist.relaxed().addTags("a").addAttributes("a", "href"));

        System.out.println("foo: " + foo);
        System.out.println("cleansedOutput: " + cleansedOutput);
    }

The output of the code is as follows:

foo: <a href='/foo/'>Foo</a>
cleansedOutput: <a>Foo</a>

As you can see, the "href" attribute is stripped even when, as shown above, I explicitly tell Jsoup to preserve anchor elements and the "href" attribute (I initially used the default Safelist.relaxed() before adding addTags() and addAttributes(); they all removed the attribute regardless).

Am I doing something wrong? Or is this a bug in Jsoup? (It's hard to believe it's a bug, as their unit tests would have failed early on.)


Solution

  • From a documentation Jsoup.clean(java.lang.String,org.jsoup.safety.Safelist)

    Note that as this method does not take a base href URL to resolve attributes with relative URLs against, those URLs will be removed, unless the input HTML contains a <base href> tag. If you wish to preserve those, use the clean(String html, String baseHref, Safelist) method instead, and enable Safelist.preserveRelativeLinks(boolean).

    String html = "<a href='/foo/'>Foo</a>";
    Safelist safelist = Safelist.relaxed();
    safelist.preserveRelativeLinks(true);
    String clean = Jsoup.clean(html, "http://", safelist);
    System.out.println(clean);
    

    Will print out

    <a href="/foo/">Foo</a>