perlparsingtaintmarpa

Is it possible to use Perl's Marpa parser for a public network server?


The documentation of Perl's Marpa parser contains the following section about tainted data:

Marpa::R2 exists to allow its input to alter execution in flexible and powerful ways. Marpa should not be used with untrusted input. In Perl' s taint mode, it is a fatal error to use Marpa's SLIF interface with a tainted grammar, a tainted input string, or tainted token values.

I am not sure, if I understand the consequences of this limitation. I understand, that the grammar must not be tainted. But I do not understand that the input must not be tainted. For me it is the task of the parser to validate the input. It sounds unreasonable to me that a parser has to trust its input.

Is it really that way? Is it impossible to implement any kind of public network service with Marpa?

I ask this because one of the reference use cases is the Marpa HTML parser and it seems to me contradictory to use a parser for HTML, which must not be used with tainted data although about 99,99% of all HTML is possibly tainted.

Can anybody explain this contradiction?


Solution

  • Marpa is actually safer than other parsers, because the language it parses is exactly that specified by the BNF. With regexes, PEG, etc., it's very hard to determine what language is actually parsed. In practice programmers tend to get a few test cases working and then give up.

    In particular, the parsing of unwanted inputs could be a major security issue -- with traditional parsers you usually don't know everything you are letting through. Rarely does a test suite check to see if inputs which should be errors are in fact accepted. Marpa parses exactly the language in its specification -- nothing less and nothing more.

    So why the scare language about taint mode? Marpa, in its most general case, can be seen as a programming language, and has exactly the same security issues. Allowing the user to execute arbitrary code is by definition insecure, and it is exactly what C, Perl, Marpa, etc. do by design. You cannot give an untrusted user a general language interface. This would be clear for C, Python, etc., but I thought someone might overlook it in the case of Marpa. Hence the scare language.

    Marpa is IMHO more secure than competing technologies. However, in the most general case, that is not secure enough.