javaregexpackage

Regex grouping in Java


I'm looking to clean everything but the Class name off of a fully qualified Class name. So, I may have something like.....

"class gqlMain.Node"

... and I'd like to end up with....

"Node"

...I'm pretty sure my pattern...

"*.[\\.][^\\.]*"

..is correct, and when if simply run it as above and test with...

myMatcherObject.matches()

...it always returns true, but when I attempt to add groupings, like...

"(.*[\\.])([^\\.]*)"

...I always get a no match found error. Not sure what's going on.

ADDED:

Thanks for the quick responses, guys. Yeah, I really don't get this. My exact code is....

public String toString() {
    Pattern packagePatt = Pattern.compile("(.*[\\.])([^\\.]*)");
    // 
    System.out.println(this.compClass.getName().toString());

    Matcher packageMatch = packagePatt.matcher(this.compClass.getName().toString());

    //
    System.out.println(packageMatch.group(2));
    return packageMatch.group(2);
}

The first print statement produces a String like "gqlMain.Node", for example (I know the toString() is redundant, I added it out of exasperation). The second print statement produces an error, as would the return statement. With a debugger I can see that the groups List for the Matcher object remains empty at every index. But if I insert a...

if (packageMatcher.matches()) {
    // print true
}

... I always get 'true'. This really makes no sense.


Solution

  • I wouldn't recommend to scan for the identifiers in that way (but I believe you wanted not to over-engineer), and you probably will like the following solution that is more strict for scanning the identifiers in general (however, speaking frankly, I don't believe I'm scanning for an identifier in the most correct way too). Additionally, it can scan for several fully/partially qualified identifiers within a single string, but it completely ignores non-qualified one (e.g. class is ambiguous).

    package stackoverflow;
    
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    import static java.lang.System.out;
    import static java.util.regex.Pattern.CASE_INSENSITIVE;
    import static java.util.regex.Pattern.compile;
    
    public final class Q11554180 {
    
        private Q11554180() {
        }
    
        //
        // (3) The same as item (1) however we're       ------------------------------------------------+
        //     capturing the group to get the class                                                     |
        //     name only                                                                                |
        // (2) At least one package name is required    ---------------------------------+              |
        // (1) We're searching valid package names only -----------------+               |              |
        //     and we do not need to capture it ?:                       |               |              |
        //                                              +----------------+--------------+|+-------------+-------------+
        //                                              |                               |||                           |
        private static final Pattern pattern = compile("(?:[\\p{Alpha}_][\\p{Alnum}_]*\\.)+([\\p{Alpha}_][\\p{Alnum}_]*)", CASE_INSENSITIVE);
    
        private static void find(CharSequence s) {
            final Matcher matcher = pattern.matcher(s);
            while ( matcher.find() ) {
                out.println(matcher.group(1));
            }
        }
    
        public static void main(String[] args) {
            find("class gqlMain.Node; class gqlMain.p1.NodeA");
            find("class gqlMain.p1.p11.NodeB");
            find("class gqlMain.p1.p11.p111.NodeC");
            find(Q11554180.class.getCanonicalName());
        }
    
    }
    

    The code above will produce the following output:

    Node
    NodeA
    NodeB
    NodeC
    Q11554180