context-free-grammarbnfebnfcontext-sensitive-grammar

BNF (EBNF) to describe table format with optional columns


Im trying the define a grammar that can be used to describe the following type of table:

**co1.......**col2.....**col3......

value.......value.......value

value.......value.......value

value.......value.......value

value.......value.......value

.....

picture of table

with **col1 and **col2 being column names. The format can optionally have additional, predefined columns (eg. let's assume **col4 and **col5 can also be included). I want to write a parser that outputs this format. Can this type of table be described using BNF or EBNF?

From what I have read so far, this is a context-sensitive grammar and thereby cannot be described by BNF or EBNF (I assume this is because row x will only include **col4 if x-1 did so as well). Is this correct? Are there any alternatives to describe the table format above?


Solution

  • If you have a variable number greater than two of columns which must have the same number of entries, or a variable number greater than two of rows that must have the same number of entries, the language cannot be context-free for the same reason a^n b^n c^n is not context-free.

    If you want an unrestricted grammar for tables, something like this would work:

    // begin with A^n B
      S := AS | AB
    
    // produce C^n D^n B
      A := CD               
     DC := CD
    
    // produce C^n . E^n B
     DB := BE
    CBE := C.EB
     BE := EB
    
    // produce C^n . E^n G^n B
      E := EG
     GE := EG
    
    // produce C^n (. E^n)+ B
     GB := BE
    EBE := E.EB
     BE := EB
    
    // produce C^n (. E^n)+
      B := (empty)
    
    // lead column headings and values to terminal symbols
      C := (rules for headers)
      E := (rules for values)
    

    The above assumes all values are of one type, for simplicity; that is, all values can be generated using the rule for E. If you want to add multiple types and coordinate with the columns involved, the grammar becomes more complicated (you need E, E', E'', etc., one for each value type, and corresponding G, G', G'', etc. and C, C', C'', etc., and duplicate productions to handle moving B around each one, but otherwise the process remains similar; and then your grammar would produce only tables with proper type matches).