I am trying to get a regular expression to find multiple entries of my pattern on a line. Note: I've been using Regex for about an hour... =/
For example:
<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
Should match twice:
1) <a href="G2532" id="1">back</a>
2) <a href="G2564" id="2">next</a>
I think the answer lies in the proper mastery of greedy vs reluctant vs possessive but I can't seem to get it to work...
I think I am close, the Regex string I have created so far is:
(<a href=").*(" id="1">).*(</a>)
But the Regex matcher returns 1 match, the entire string...
I have a (compilable) Java Regex Test Harness in code below. Here's my recent (futile) attempts to get this using that program, the output should be pretty intuitive.
Enter your regex: (<a href=").*(" id="1">).*(</a>)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)+
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (<a href=").*(" id="1">).*(</a>)?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: ((<a href=").*(" id="1">).*(</a>))?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
I found the text "" starting at index 63 and ending at index 63.
Enter your regex: ((<a href=").*(" id="1">).*(</a>))+?
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Enter your regex: (((<a href=").*(" id="1">).*(</a>))+?)
Enter input string to search: <a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>
I found the text "<a href="G2532" id="1">back</a> <a href="G2564" id="2">next</a>" starting at index 0 and ending at index 63.
Here's the Java:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexTestHarness {
public static void main(String[] args){
try{
while (true) {
System.out.print("\nEnter your regex: ");
BufferedReader reader = new BufferedReader(new InputStreamReader(System.in));
Pattern pattern = Pattern.compile(reader.readLine());
System.out.print("Enter input string to search: ");
Matcher matcher = pattern.matcher(reader.readLine());
boolean found = false;
while (matcher.find()) {
System.out.println("I found the text \"" + matcher.group() + "\" starting at " +
"index " + matcher.start() + " and ending at index " + matcher.end() + ".");
found = true;
}
if(!found){
System.out.println("No match found.");
}
}
} catch (IOException e) {
e.printStackTrace();
System.exit(-1);
}
}
}
Try this:
<a href=".*?" id="1">.*?</a>
I've converted the captures to non-greedy by adding a ?
after .*
But when in doubt, you can use this trick:
<a href="[^"]*" id="1">[^<]*</a>
[^"]*
means any number of characters that aren't a double quote
[^<]*
means any number of characters that aren't a left angle
So you avoid worrying about greedy/non-greedy