I've got a bunch of ACPI Source Language files and I want to calculate file to file similarities between them. I thought of using something like Perl's Parse::RecDescent but I am stuck at:
1) Translating the ACPI Grammar (www.acpi.info/DOWNLOADS/ACPIspec40a.pdf) to something Parse::RecDescent would understand 2) Have a metric to compare 2 parsed files
Any ideas?
So you have two problems:
Parsing ACPI to build an AST. This has the usual troubles of ensuring that you have a well defined grammar, that your parsing machinery can parse according to that grammar (often you have to bend a good grammar definition to enable the parsing machiney to process it), and building a corresponding AST. You will have these troubles with Perl parsing machinery, simply because it is a parsing engine.
Comparing the structure of the ASTs and producing a sensible answer. What you are likely to find here is that there is some literature describing roughtly how to do this (using e.g. Levenshtein distance), but that the details for ASTs matter. (Change distilling: Tree differencing for fine-grained source code change extraction Finally, having determined the distance, you need to print out the deltas in some readable form.
However, AFAIK, my company is the only one that has reduced this to practice. See our Smart Differencer tool. THe SmartDifferencers parse, build ASTs, and report changers in terms of ASTs elements moved, inserted, deleted, replaced, or modifiied by consistent identifier substitition. They depend on any underlying very strong GLR parsing engine which minimized the problems of accepting new grammars. They work for many common languages but not presently for ACPI.