I have to write a function that checks whether input strings are valid for a given language specification. I thought that this would be a standard CFG -> Chomsky Normal Form, then CYK parsing, but one of the rules in the language is preventing this from happening.
Some of the rules are straightforward, if we define terminals {a,b,c,d,e,f,P,Q,R,S}
, then valid strings are
1) Any of the lowercase terminals in isolation
2) If 'x' is a valid string, then so is Sx
But the third rule is
3) If X and Y are valid input strings, then so are PXY, QXY, RXY
where {P,Q,R}
are the remaining uppercase terminals and X and Y are nonterminals.
What would the production rule for this look like? I thought it would be something like
XY -> PXY (and QXY, RXY)
but there are two problems with this. The first is that this is not a CFG rule -- does that mean this language defines a CSG instead?
And the second is that this doesn't work, because
XY -> PXY -> PPXY
would not be a valid message in all cases.
I think that this grammar is context-free, unless I'm misinterpreting what you're saying.
First, let's let A be the nonterminal that expands out to some valid string made just using the first two rules, we get
A -> a | b | c | d | e | f
Now, your second rule says that if you can build the string ω then you can build Sω. We could encode that as
A -> SA
Finally, you've said that if you have two strings X and Y, then you can combine them together as
PXY
QXY
RXY
One way to think about this would be to generate the string P, followed by any two valid strings (same for Q or R). Thus you could add the rules
A -> PAA | QAA | RAA
This gives the final grammar
A -> a | b | c | d | e | f
A -> SA
A -> PAA | QAA | RAA
Hope this helps!