c++compilationgrammar

Why does the C++ compiler give errors after lines instead of on them?


This question popped into my head today at work when I was having yet another domestic affair with my compiler. Despite my buff pinky (due to all the semicolon pressing I do at work), I managed to miss one before an if statement. Obviously, this resulted in a compile error:

error C2143: syntax error : missing ';' before 'if'

So I wondered "well gee, why can't you tell me the line that's missing the semicolon instead of the line after the problem." and I proceeded to experiment with other similar syntax errors:

error C2065: 'myUndeclared' : undeclared identifier

error C2143: syntax error : missing ')' before 'if'

etc...

Now, all of those errors would, similarly, take me to the line after the problem and complain about something before the if statement.

Consider the following:

SomeFunction(x) //Notice, there is no ';' here

if(bSomeCondition)
{
    ...
}

I get two compile errors:

(Line 265) error C2065: 'x' : undeclared identifier

(Line 266) error C2143: syntax error : missing ';' before 'if'

However, the first error correctly tells me the line number, despite the missing semicolon. This suggests to me that the compiler doesn't get tripped up in parsing and is able to make it past the semicolon problem. So, why is it that the compiler insists on grammatical errors being reported in this way? Other errors (non grammatical) are reported on the lines they are found. Does this have to do with the compiler making multiple passes? Basically, I hope someone with a working knowledge of the C++ compiler might explain specifically what the compiler is doing that necessitates the reporting of errors in this "before" way.


Solution

  • The short answer to the more general question of "Why do C/C++ error messages suck" is "Sometimes C++ is really hard to parse" (it doesn't actually have a context free grammar). However, this isn't really a valid reason - one can still make tools that record better diagnostic information than most C++ compilers.

    The more practical answer is "Compiler authors have inherited legacy codebases which didn't value error messages", combined with a mild dose of "compiler authors are lazy", topped with "Diagnostic reporting isn't an exciting problem". Most compiler writers would add a new language feature or 3% codegen performance improvement, rather than do significant refactoring on the codebase to allow decent error reporting. The specific question about "Why aren't errors properly localised to the line that 'caused' them" is an instance of this. There's not really a technical reason compilers can't generally work out that a ; is missing , and then tell you about the source span of the last ; lacking statement - even in the presence of C++'s general whitespace invariance. It's just that storing that information has (largely) been historically ignored.

    That said, new compilers not hampered by decades of old code are doing much better. Have a look at the Clang compiler, which prides itself on sensible error messages. The page on diagnostics shows how much better than GCC they are. An example for this case being:

      $ gcc-4.2 t.c
      t.c: In function 'foo':
      t.c:5: error: expected ';' before '}' token
      $ clang t.c
      t.c:4:8: error: expected ';' after expression
        bar()
             ^
             ;
    

    Or, more impressively:

      $ cat t.cc
      template<class T>
      class a {}
      class temp {};
      a<temp> b;
      struct b {
      }
      $ gcc-4.2 t.cc
      t.cc:3: error: multiple types in one declaration
      t.cc:4: error: non-template type 'a' used as a template
      t.cc:4: error: invalid type in declaration before ';' token
      t.cc:6: error: expected unqualified-id at end of input
      $ clang t.cc
      t.cc:2:11: error: expected ';' after class
      class a {}
                ^
                ;
      t.cc:6:2: error: expected ';' after struct
      }
       ^
       ;
    

    Look, it's even telling us what to type where to fix the problem! </clang_salespitch>