javaregexclassmatchingfully-qualified-naming

Regular expression matching fully qualified class names


What is the best way to match fully qualified Java class name in a text?

Examples: java.lang.Reflect, java.util.ArrayList, org.hibernate.Hibernate.


Solution

  • A Java fully qualified class name (lets say "N") has the structure

    N.N.N.N
    

    The "N" part must be a Java identifier. Java identifiers cannot start with a number, but after the initial character they may use any combination of letters and digits, underscores or dollar signs:

    ([a-zA-Z_$][a-zA-Z\d_$]*\.)*[a-zA-Z_$][a-zA-Z\d_$]*
    ------------------------    -----------------------
              N                           N
    

    They can also not be a reserved word (like import, true or null). If you want to check plausibility only, the above is enough. If you also want to check validity, you must check against a list of reserved words as well.

    Java identifiers may contain any Unicode letter instead of "latin only". If you want to check for this as well, use Unicode character classes:

    ([\p{Letter}_$][\p{Letter}\p{Number}_$]*\.)*[\p{Letter}_$][\p{Letter}\p{Number}_$]*
    

    or, for short

    ([\p{L}_$][\p{L}\p{N}_$]*\.)*[\p{L}_$][\p{L}\p{N}_$]*
    

    The Java Language Specification, (section 3.8) has all details about valid identifier names.

    Also see the answer to this question: Java Unicode variable names