pythonregexpycparser

How to split a C program by its function blocks?


I am trying to split a C program by its function blocks. For example,

I tried using regex library and try to split by (){. But of no use. Not sure where to begin.

string = """
int firt(){
    if () { 

    }
}

customtype second(){
    if () { 

    }
    for(){

    }
}
fdfndfndfnlkfe
    """

And I want the result to be a list that has each of the function block as an element: ['int first(){ ... }', 'customtype second(){....}']

I tried the following but getting None

import regex
import re

reg = r"""^[^()\n]+\([^()]*\)\s*
\{
    (?:[^{}]*|(?R))+
\}"""

print(regex.match(reg, string))

Solution

  • First of all: don't - use a parser instead.
    Second, if you insist and to see why should use a parser instead, have a glimpse at this recursive approach (which will only work with the newer regex module):

    ^[^()\n]+\([^()]*\)\s*
    \{
        (?:[^{}]*|(?R))+
    \}
    

    See a demo on regex101.com. This will break with comments that include curly braces.


    In Python this would be

    import regex as re
    
    reg = re.compile(r"""^[^()\n]+\([^()]*\)\s*
    \{
        (?:[^{}]*|(?R))+
    \}""", re.VERBOSE | re.MULTILINE)
    
    for function in reg.finditer(string):
        print(function.group(0))