javaregexintellij-idea

Is there a format to highlight RegEx format in a parameter in java?


I am making a method that recieves a RegEx, i want to know if there is a way to tell java that the string i am recieving is a RegEx and make it highlight it.

Similar to what happens when you have the following:

"StringExample0123".replaceAll("[0-9]","");

The previous line of code would highlight both the "[]" and the "-" since they are part of RegEx formatting.

So what i am looking for is that when i do:

customMethod("[0-9]");

is also highlighted with proper RegEx syntax

Currently the customMethod() has only the replaceAll documentation that i tried to use to mess around and see if there was a way to highlight a param:

/**
 * Replaces each substring of this string that matches the given <a
 * href="../util/regex/Pattern.html#sum">regular expression</a> with the
 * given replacement.
 *
 * <p> An invocation of this method of the form
 * <i>str</i>{@code .replaceAll(}<i>regex</i>{@code ,} <i>repl</i>{@code )}
 * yields exactly the same result as the expression
 *
 * <blockquote>
 * <code>
 * {@link java.util.regex.Pattern}.{@link
 * java.util.regex.Pattern#compile(String) compile}(<i>regex</i>).{@link
 * java.util.regex.Pattern#matcher(java.lang.CharSequence) matcher}(<i>str</i>).{@link
 * java.util.regex.Matcher#replaceAll(String) replaceAll}(<i>repl</i>)
 * </code>
 * </blockquote>
 *
 *<p>
 * Note that backslashes ({@code \}) and dollar signs ({@code $}) in the
 * replacement string may cause the results to be different than if it were
 * being treated as a literal replacement string; see
 * {@link java.util.regex.Matcher#replaceAll Matcher.replaceAll}.
 * Use {@link java.util.regex.Matcher#quoteReplacement} to suppress the special
 * meaning of these characters, if desired.
 *
 * @param   regex
 *          the regular expression to which this string is to be matched
 *
 * @return  The resulting {@code String}
 * 
 *          if the regular expression's syntax is invalid
 *
 * @see java.util.regex.Pattern
 *
 * @since 1.4
 */

And there is not any other configurations other than the default configs you get on a fresh install of InteliJ IDEA community edition or a freshly created Maven project.

I tried copying the replaceAll() documentation of a string and pasted it for the method in hopes to see if there was some formatting, specific class or something they gave to achieve this but while it properly gave the documentation, it did not add the param highlighting.

Expected Result ([] and - is highlighted due to it being part of regex syntax):
snippet of code that reads "StringExample0123".replaceAll("[0-9]","");

Result (String formatting only, giving it only the green color):

snippet of code that reads customMethod("[0-9]");

Additional note: If its InteliJ IDEA dependant i am using IDEA and would only work locally its a shame but its useful still. Thanks in advance


Solution

  • There is no 'this is regex' at all in the java language specification.

    However, you're doing it wrong. Please read to the end for the true solution.

    But.. intellij has it!

    That's an IntelliJ specific thing.

    It has an 'extension annotations' system baked into it, like many IDEs, where an external source indicates what annotations should have been on well known APIs (including java.* itself), primarily for @Nullable @NonNull purposes. Many libraries aren't annotated with such annotations, so somebody contributes a list of 'well, if it had been, these annotations would appear on these methods' and intellij uses that and ships with lists of such extension definitions. It includes an annotation that marks a parameter as 'this is a regex', and this is what powers IntelliJ's regex highlighting: It treats e.g. Pattern.compile's String pattern argument as being annotated to indicate its a regex parameter via the external annotations mechanism. Other IDEs may employ a similar strategy.

    For IntelliJ this is described here in the JetBrain IntelliJ docs on java annotations.

    The annotation you are looking for is called org.intellij.lang.annotations.RegExp, as found here.

    You can either use the external annotations feature (see the docs), or just add that annotation to your code. But it's not standard java, and it would have no effect on other IDEs.

    The maven GAV coordinates you need to add as a dependency to your pom to add these annotations to your project is org.jetbrans :: annotations.

    Doing it right

    Optimally you're looking for a non-IDE-specific annotation. There are quite a few efforts (really suffering from that XKCD thing) but the one with lots of momentum behind it is JSpecify. Unfortunately, so far jspecify is only working on nullity annotations. You could check with them if they have any interest in adding this.

    Doing it right, take 2

    But, all of this is wrong. replaceAll is bad API design. It should never exist. The sheer number of SO questions that are confused about replaceAll is countable in percentages of all java questions, it is that bad.

    (For context: .replace also replaces all. The existence of replaceAll therefore suggests that replace thus must not be doing 'all'. Except, replace also replaces all (its argument is a literal string, that exact string is replaced, no regexp), replaceAll replaces all (but its argument is a regex), and replaceFirst replaces only the first occurrence and its first argument is a regex. There is no non-regex replaceFirst).

    The name of the method makes absolutely no indication that it's a regex.

    Therefore this is a security leak 1. And one that confuses a ton (literally - see SO for proof of that) of users. Truly, if there is such a thing as 'crime against humanity' level idiotic API design, replaceAll is a real contender for this golden raspberry-esque award. And you want to emulate it. What a terrible idea.

    Java is nominal. Types should have names and those names should explain how things work. That's so obvious, there's a name for APIs that break this rule: "Stringly typed API" (it's a pejorative).

    Thus, if an argument is a regexp, then.. declare it properly.

    This method is correct:

    /**
     * Additional information not already implied by the notion 'the method is named "foobar" and it has one parameter: A regex.
     */
    void foobar(Pattern pattern) { ... }
    

    and this method is absolutely terrible:

    /**
     * {@code pattern} is a regexp! Surprise!!
     * 
     * Additional information not already implied by the notion 'the method is named "foobar" and it has one parameter: A regex.
     */
    void foobar(String pattern) {
      Pattern p = Pattern.compile(pattern);
    }
    

    And if you do it that way, you get what you want automatically. Because intellij already knows that Pattern.compile takes a regex, as do all other IDEs that have regexp-specific colouring available.

    It even has the considerable advantage that one can preconstruct the patterns. Pattern creation is not necessarily cheap. .replaceAll requires that the string is 'compiled' into a regular expression every time. By making the argument a pattern you both make that more clear and you offer a solution to it; something that .replaceAll simply doesn't have. A user can write:

    private static final Pattern MY_PATTERN = Pattern.compile("...>");
    

    and just use MY_PATTERN as argument. Now the regex is only compiled once-per-VM-boot instead of millions of times during the VM's lifetime.


    [1] Regexes, huh? Parsing untrusted user input as a regexp (i.e. the code Pattern.compile(someWebTextField.get()) is a security leak because running a regexp has essentially unbounded CPU impact: A small regex run against a small input can take an hour of CPU time, literally, given a sufficiently evilly crafted regex. Regexes can be used to calculate primes, for example. It's not even a matter of 'write a better optimized regex engine'. Regexes are that complex -inherently-. Most people don't know that. Even if you do, if you don't know that the argument to replaceAll is a regex, a hacker can take your server down with a single simple query lazily rerun every second or so. No need for a big DDOS network and DDOS mitigation strategies such as cloudflare or similar won't help you. Separately, if you thought it was a literal intermixed with user input, then regexes can obliterate your protections. Something like:

    someValue.replace(">>" + untrustedUserInput.replace("<<", "") + "<<", newUserName);
    

    Might be secure (though probably not a great idea). However, make those replaceAll and that is a serious vulnerability; I can e.g. input foo|bar and now the scan for the >> is eliminated due to the nature of the | which has special meaning in regex.