python-3.xdjangosemgrep

Semgrep: A scalable way of catching all cases in a multiline f-strings


I have some logs in my codebase that have multiline f-strings, such as:

...
logger.error(
    f'...'
    f'...'
    f'...'
    f'...'
    f'...'
    f'...'
)

And some only have two f'...'s on separate lines while others 3 f'...'s, and so on.

I am currently duplicating patterns to catch such logs. For example:

...
patterns:
  - pattern-either:
     - pattern: |
        logger.$METHOD(
          f'...'
          f'...'
          f'...'
         )
     - pattern: |
        logger.$METHOD(
          f'...'
          f'...'
          f'...'
          f'...'
          f'...'
        )

Catch those with 3 and 5 f'...'s on multiple lines. I have to write another pattern for those with 4, 2 and so on.

Is there a scalable way to capture all of these with fewer patterns? The current implementation won't scale as there might be logs with 6, 7, 8, 9 and so on multiline f-strings.


Solution

  • A good solution has been posted here and a live demo here.

    Basically, instead of duplicating the lines, a metavariable, $X was created to represent the message. In case the message doesn't match '...', it is flagged as a suspect. The full code is:

    rules:
      - id: test
        patterns:
          - pattern-either:
              - pattern: logger.$METHOD(..., $X, ...)
              - pattern: logger.$METHOD(..., message=$X, ...)
              - pattern: logger.$METHOD(..., msg=$X, ...)
          - metavariable-pattern:
              metavariable: $X
              patterns:
                - pattern-not: |
                    "..."
        message: Semgrep found a match $X
        languages:
          - python
        severity: WARNING
    

    All credits to lagoAbal.