regexbashpcregrep

How pcregrep force to return first match of regexp


I have ci-pipelines and there are a lot of before_scripts sections. I would like to make a multiline regexp. I export all before script to my-ci-jobs.txt with python script.

pcregrep -M 'before_script.*\n.*' my-ci-jobs.txt 
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"

This works fine, but sometimes, there are more lines in before script, so I would like to make regular that catch everything between before_script and first match of ],. But when I implement it, it will catch the longest match. This is my command (I will not past here the result, it is the whole file till the last ],):

pcregrep -M 'before_script.*(\n|.)*],' my-ci-jobs.txt

How can I make regexp to match first match? Is there a better way how to do a multiline regexp?


Solution

  • You almost never need (.|\n) in a regular expression, there are better means to match any chars including line break chars.

    To match any zero or more chars but ] you may use [^]]* pattern:

    pcregrep -M 'before_script[^]]*]' file
    

    If you need the first match only, add | head -1:

    pcregrep -M 'before_script[^]]*]' file | head -1
    

    Pattern details