I am reaching out since I realized Java Regex is not my cup of tea. I am dealing with strings and I want to extract all strings except those within quotes.
For example,
String myString = "IF What young > old, 'Price 1', 'Prediction One', 'Mother''s Day', good";
I tried using this regex, which works as long as there is no space between the words within the quotes.
String myRegex = "\\b(?!\\')[a-zA-Z_]+\\d*[.]?[a-zA-Z_]*\\d*(?!\\')\\b";
Pattern p = Pattern.compile(myRegex);
Matcher m = p.matcher(myString);
Set<String> mySet = new HashSet();
while(m.find()){
mySet.add(m.group());
}
System.out.println("Set: " + mySet);
This gives me:
Set: [s, young, Price, old, Prediction, IF, What, good]
But what I want is:
Set: [young, old, IF, What, good]
The moment I try to allow spaces within the quotes using \\s+
, it just refuses to work. Regex is very hard and confusing. It would be very grateful for any pointers and help!
Thank you!
Here is a different solution based on Matcher.results
:
List<String> list = Pattern.compile("'[^']*'|(\\w+)")
.matcher(myString)
.results()
.filter(res -> res.group(1) != null)
.map(res -> res.group(1))
.collect(Collectors.toList());
System.out.printf("List: %s%n", list);
//=> List: [IF, What, young, old, good]
This regex matches all quoted strings OR all words in a matching group.