There are two different kinds of wikitext hyperlinks:
[[stack]]
[[heap (memory region)|heap]]
I would like to remove the hyperlinks but keep the text:
stack
heap
Currently, I am running two phases, employing two different regular expressions:
public class LinkRemover
{
private static final Pattern
renamingLinks = Pattern.compile("\\[\\[[^\\]]+?\\|(.+?)\\]\\]");
private static final Pattern
simpleLinks = Pattern.compile("\\[\\[(.+?)\\]\\]");
public static String removeLinks(String input)
{
String temp = renamingLinks.matcher(input).replaceAll("$1");
return simpleLinks.matcher(temp).replaceAll("$1");
}
}
Is there a way to "fuse" the two regular expressions into one, achieving the same result?
If you want to check your proposed solutions for correctness, here is a simple test class:
public class LinkRemoverTest
{
@Test
public void test()
{
String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
String output = LinkRemover.removeLinks(input);
assertEquals(expected, output);
}
}
You can make the part until the pipe optional:
\\[\\[(?:[^\\]|]*\\|)?([^\\]]+)\\]\\]
And to be sure you are always between square brackets, use the character classes.
fiddle (click the Java button)
pattern details:
\\[\\[ # literals opening square brackets
(?: # open a non-capturing group
[^\\]|]* # zero or more characters that are not a ] or a |
\\| # literal |
)? # make the group optional
([^\\]]+) # capture all until the closing square bracket
\\]\\]