[SOLVED] Remove alphanumeric word from string

Remove alphanumeric word from string

I am trying to remove alphanumeric word from string..

 String[] sentenceArray= {"India123156 hel12lo 10000 cricket 21355 sport news 000Fifa"};
    for(String s: sentenceArray)
        {
            String finalResult = new String();
            String finalResult1 = new String();
            String str= s.toString();
            System.out.println("before regex : "+str);
            String regex = "(\\d?[,/%]?\\d|^[a-zA-Z0-9_]*)";
            finalResult1 = str.replaceAll(regex, " ");
            finalResult = finalResult1.trim().replaceAll(" +", " ");
            System.out.println("after regex : "+finalResult);
        }

output: hel lo cricket sport news Fifa

but my required output is: cricket sport news

Guys please help.. Thank you in advance

Solution

To match the words you want to exclude and the following space characters, you can use the following regex in case-insensitive mode (demo):

\b(?=[a-z]*\d+)\w+\s*\b

In Java, to replace this, you can do:

String replaced = your_original_string.replaceAll("(?i)\\b(?=[a-z]*\\d+[a-z]*)\\w+\\s*\\b", "");

Token-by-Token Explanation

\b                       # the boundary between a word char (\w) and
                         # something that is not a word char
(?=                      # look ahead to see if there is:
  [a-z]*                 #   any character of: 'a' to 'z' (0 or more
                         #   times (matching the most amount
                         #   possible))
  \d+                    #   digits (0-9) (1 or more times (matching
                         #   the most amount possible))
)                        # end of look-ahead
\w+                      # word characters (a-z, A-Z, 0-9, _) (1 or
                         # more times (matching the most amount
                         # possible))
\s*                      # whitespace (\n, \r, \t, \f, and " ") (0 or
                         # more times (matching the most amount
                         # possible))
\b                       # the boundary between a word char (\w) and
                         # something that is not a word char