I am creating regex to extract various fields from logs files. I have created one regex using some tools and its almost complete. the only thing is for one field its extracting only one digit instead of full number. for better understanding I have saved it to below link.
Pattern:
/(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew))^(?:).*(?P<ParNew_before_1>\d)K\->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)/
String:
146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080K), 0.0320299 secs] [Times: user=0.32 sys=0.01, real=0.03 secs]
Current Output:
Full match `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group `ParNew_before_1` `3`
Group `ParNew_after_1` `88155`
Group `young_heap_size` `419456`
Group `par_new_duration` `0.0313803`
Group `ParNew_before_2` `9893391`
Group `ParNew_after_2` `9602913`
Group `total_heap_size` `12478080`
Expected Output:
Full match `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group
ParNew_before_1
378633
Group `ParNew_after_1` `88155`
Group `young_heap_size` `419456`
Group `par_new_duration` `0.0313803`
Group `ParNew_before_2` `9893391`
Group `ParNew_after_2` `9602913`
Group `total_heap_size` `12478080`
In above example: Group ParNew_before_1
extracting only one digit.
There are three things I'd like to note here:
^
(it will make more sense to check its pattern at the start of the string only)\d
won't match more than 1 digit, add +
after it to match 1 or more.*
is too greedy, use lazy .*?
.Use
^(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew)).*?(?P<ParNew_before_1>\d+)K->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)
^^^ ^ ^ ^
See this regex demo
Also, you do not need to escape -
that are not inside character classes.