javaregex

Java Regex Give All String Except those within Single Quotes


I am reaching out since I realized Java Regex is not my cup of tea. I am dealing with strings and I want to extract all strings except those within quotes.

For example,

      String myString = "IF What young > old, 'Price 1', 'Prediction One', 'Mother''s Day', good";

I tried using this regex, which works as long as there is no space between the words within the quotes.

String myRegex = "\\b(?!\\')[a-zA-Z_]+\\d*[.]?[a-zA-Z_]*\\d*(?!\\')\\b";
Pattern p = Pattern.compile(myRegex);
Matcher m = p.matcher(myString);
Set<String> mySet = new HashSet();
while(m.find()){
   mySet.add(m.group());
}

System.out.println("Set: " + mySet);

This gives me:
Set: [s, young, Price, old, Prediction, IF, What, good]

But what I want is:

 Set: [young, old, IF, What, good]

The moment I try to allow spaces within the quotes using \\s+, it just refuses to work. Regex is very hard and confusing. It would be very grateful for any pointers and help!

Thank you!


Solution

  • Here is a different solution based on Matcher.results:

    List<String> list = Pattern.compile("'[^']*'|(\\w+)")
       .matcher(myString)
       .results()
       .filter(res -> res.group(1) != null)
       .map(res -> res.group(1))
       .collect(Collectors.toList());
    
    System.out.printf("List: %s%n", list);
    //=> List: [IF, What, young, old, good]
    

    Code Demo

    This regex matches all quoted strings OR all words in a matching group.