Find a context-free grammar (CFG) for the language L of all words such that each terminal in a word occurs even number of times over a possibly large alphabet Σ
My long aproach is (the only nonterminal is S):
S ⟶ ε | SS
x ∈ Σ : S ⟶ xSx
x,y ∈ Σ : S ⟶ xxSyy | yySxx | xySxy | xySyx | yxSyx | yxSyx
Is this correct? The productions generate correct words, do they generate all words?
EDIT: Can a CFG over large alphabet describe a language, where each terminal appears even number of times?
EDIT_2: If a solution exists, is it possible Chomsky Normal Form be polynomial in |Σ| ?
There is even a regular grammar to achieve this. Since every regular grammar is by definition context-free, the answer ist "yes". It's also possible to construct a finite automaton, but since you asked for a grammar...
Here's how: recall that a regular grammar allows rules of the form A -> b C or A -> b or A -> epsilon, where A and C are non-terminals and b is a terminal. This essentially means that every non-terminal generates a terminal and another non-terminal which will generate the rest of the string; we could say that every non-terminal encodes a certain property about the strings it generates.
Consider now all the subsets of the alphabet Sigma. Since Sigma is supposed to be finite, so is the set of subsets (powerset). Let the set of non-terminals be the powerset of Sigma.
Start with a rule: {} -> a {a} for every terminal a. For every non-terminal X, add the rule X -> a X-{a} if a is in X; or X -> a X+{a} if a is not in X (I'm sloppily writing + and - for set union and difference here).
Finally, add {} -> epsilon (the empty word).
The grammar encodes in its non-terminals exactly the sets of terminals which have appeared in an odd number and thus have to be "canceled out" again.