awksedgrepfind

matching python multiline expression string from a file using grep?


please note that this is not a python question. I have multiple directories (around 500 directories, called modules) which include a __manifest__.py file each. this file is considered as a metadata of the module. the file looks like the following:

{
    'name': 'Associations Management',
    'version': '0.1',
    'category': 'Marketing',
    'depends': [
        'base_setup', 
        'membership',
        'event'
    ],
    'data': ['views/views.xml'],
    'demo': [],
    'installable': True,
    'auto_install': False,
}

I'd like to match & extract (using Linux shell only) a pattern which could be as following:

'depends': ['base', 'web],
// or multi-line as
"depends": [
    'base',
    'web',
]

I am really interested in extracting such information using Linux commands such as grep or sed or awk & I'm not interested in evaluating each file using python interpreter. so I used the following Linux command

find . -iname __manifest__.py | xargs -I{} grep -H -E "('|\")depends('|\")(.?|\n)*\]\s*," {}

however my regex doesn't provide me with multi-line selection. also I am worried about matching more lines that are not needed as following:

'depends': [
        'base_setup', 
        'membership',
        'event'
    ],
    'data': ['views/views.xml'],

thank you


Solution

  • With GNU grep

    $ grep -zoE "'depends'"':\s*\[[^][]+]' ip.txt | tr '\0' '\n'
    'depends': [
            'base_setup', 
            'membership',
            'event'
        ]
    

    With ripgrep:

    $ rg -oUN "'depends'"':\s*\[[^\]\[]+]' ip.txt
    'depends': [
            'base_setup', 
            'membership',
            'event'
        ]
    

    The advantage is that this doesn't depend upon NUL character and doesn't have to read entire input in one go. -U is multiline matching option and -N turns off line number prefix (which is on by default for terminal output). Also, both GNU grep and rg support recursive searching.


    If your data to be matched is always whole lines, with 'depends': [ in a single line, you could also use awk. See How to print lines between two patterns, inclusive or exclusive (in sed, AWK or Perl)? for explanations.

    $ awk '/\047depends\047:[[:blank:]]*\[/{f=1} f; /]/{f=0}' ip.txt
        'depends': [
            'base_setup', 
            'membership',
            'event'
        ],