sqlcommentspyparsingrecursive-datastructureshints

Pyparsing: How to parse SQL Hints


I am trying to parse the EBNF below (commentated in the code) and I'm struggling to resolve the STRING part of the optional comments.. (written as extra comment in my test string)

 from pyparsing import *

# SQL HINT EBNF 
'''
{ /*+ hint [ string ]
      [ hint [ string ] ]... */
| --+ hint [ string ]
      [ hint [ string ]...
}
'''

test_string = "/*+ALL_ROWS extra comment FIRST_ROWS CACHE*/"

LCOMMENT = Literal("/*+")
RCOMMENT = Literal("*/")

grammar = Forward()

hint_all_rows = Literal("ALL_ROWS")
hint_first_rows = Literal("FIRST_ROWS")
hint_cache = Literal("CACHE")

comment_in_hint = Word(printables)

all_hints = (hint_all_rows | hint_first_rows | hint_cache)+ ZeroOrMore(comment_in_hint)

grammar <<  all_hints  + ZeroOrMore(grammar)

all_grammar = LCOMMENT + grammar + RCOMMENT

p = all_grammar.parseString(test_string)

print p

Solution

  • This is the code that now runs thanks to Paul McGuire's help in the comments on the OP. I did get rid of the forward function when initially setting the answer here. But checking the code by attaching result names to the different elements, I noticed that my first answer here was classifying all but the first hint as comments. So therefore I kept the Forward but utilised some other suggestions of Pauls.

    from pyparsing import *
    
    # SQL HINT EBNF
    '''
    { /*+ hint [ string ]
          [ hint [ string ] ]... */
    | --+ hint [ string ]
          [ hint [ string ]...
    }
    '''
    
    LCOMMENT = Literal("/*+")
    RCOMMENT = Literal("*/")
    
    grammar = Forward()
    
    hint_all_rows = Keyword("ALL_ROWS")
    hint_first_rows = Keyword("FIRST_ROWS")
    hint_cache = Keyword("CACHE")
    
    comment_in_hint = Word(printables, excludeChars='*')
    
    grammar = Forward()
    
    all_hints = (hint_all_rows | hint_first_rows | hint_cache).setResultsName("Hints", listAllMatches=True) + Optional(comment_in_hint)("Comments*")
    
    grammar << all_hints + ZeroOrMore(grammar)
    
    all_grammar = LCOMMENT + grammar + RCOMMENT
    
    p = all_grammar.parseString("/*+ ALL_ROWS aaaaaaa FIRST_ROWS bbbbb */")
    
    print p["Hints"]
    
    print p["Comments"]