c++tokenizesic

Tokenizing a SIC Assembler source


I've pretty much finished coding a SIC assembler for my systems programming class but I'm stumped on the tokenizing part.

For example, take this line of source code:

The format (free format) is: {LABEL} OPCODE {OPERAND{,X}} {COMMENT}

The curls indicate that the field is optional.

Also, each field must be separated by at least one space or tab.

ENDFIL      LDA     EOF         COMMENT GOES HERE

The code above is a bit easier to organize but the following snippet is giving me difficulties.

        RSUB                COMMENT GOES HERE

My code will read in the first word of the comment as if it were an OPERAND.

Here is my code:

//tokenize line
    if(currentLine[0] != ' ' && currentLine[0] != '\t')
    {
        stringstream stream(currentLine);
        stream >> LABEL;
        stream >> OPCODE;
        stream >> OPERAND;
        stream.str("");


        if(LABEL.length() > 6 || isdigit(LABEL[0]) || !alphaNum(LABEL))
        {
            errors[1] = 1;
        }
        else if(LABEL.length() == currentLine.length())
        {
            justLabel = true;
            errors[6] = 1;
            return;
        }
    }
    else
    {
        stringstream stream(currentLine);
        stream >> OPCODE;
        stream >> OPERAND;
        stream.str("");
    }

My professor requires that the assembler be tested with two versions of the source code--one with errors and one without.

The RSUB OPCODE isn't dependent on an OPERAND so I understand that everything after the RSUB OPCODE can be considered a comment, but If the erroneous source code contains a value in the OPERAND field or if an OPCODE which is dependent on an OPERAND is missing the OPERAND value, how do I compensate for this? I need to flag these as errors and print out the erroneous OPERAND value (or lack thereof).

My question is: How do I prevent the comment portion of the code from being considered an OPERAND?


Solution

  • In the assembly languages (as in other programming languages) that I've seen, there's a delimiter that marks a comment: for example a semicolon before the comment:

    ENDFIL LDA EOF ;COMMENT GOES HERE
    RSUB ;ANOTHER COMMENT GOES HERE
    

    In your syntax however, can you tell whether something is a comment by the amount of whitespace which precedes it on the line, e.g. by the fact that there are two (not just one) whitespace events between the opcode and the comment?

    {LABEL}<whitespace>OPCODE<whitespace>{OPERAND{,X}}<whitespace>{COMMENT}