javaregexregex-negation

Match all lines starting with a space up till a line that doesnt start with a space


So I have a few lines like such:

tag1:
 line1word1 lineoneanychar
 line2word1
tag2:
 line1word1 ....
 line2word1 .....

I am trying to build a java regex that extracts all the data under the tags. i.e:

String parsed1 = line1word1 lineone\nline2word1
String parsed2 = line1word1 ....\nline2word1 .....

I believe the right way to do this is using something like this, but I haven't quite got it right:

    Pattern p = Pattern.compile("tag1:\n( {1}.*)\n(?!\\w+)", Pattern.DOTALL);
    Matcher m = p.matcher(clean_data);
    if(m.find()){
        System.out.println(m.group(1));
    }

Any help would be appreciated!


Solution

  • Could be something like that

    public static void main(String[] args) throws Exception {
        String input = "tag1:\n" 
                + " line1word1 lineoneanychar\n" 
                + " line2word1\n"
                + "tag2:\n" 
                + " line1word1 ....\n" 
                + " line2word1 .....\n";
    
        Pattern p = Pattern.compile("tag\\d+:$\\n((?:^\\s.*?$\\n)+)", Pattern.DOTALL|Pattern.MULTILINE);
        Matcher m = p.matcher(input);
        while(m.find()){
            System.out.println(m.group(1));
        }
    }
    

    Remember to escape \\ in your regex.

    \d is a number

    \s a space

    (?:something) is for making a group that won't be a real 'group' in the matcher