The exercise reads "Write a program to check a C program for rudimentary syntax errors like unbalanced parentheses, brackets, and braces. Don't forget about quotes, both single and double, escape sequences, and comments."
I chose to go about solving the problem by putting parentheses, brackets, and braces on a stack and making sure everything was LIFO along with various counters for marking whether we're in a comment, quote, etc.
The issue is that I feel my code, although it works, is poorly structured and not particularly idiomatic. I tried implementing the state variables (the stack, escaped
, inString
, etc.) within a struct and breaking apart the tests into subroutines. It didn't help much. Is there a way to solve this problem in a cleaner way while still handling escaped characters and the like correctly?
#include <stdio.h>
#include <stdlib.h>
#define INITIALSTACK 8
#define FALSE 0
#define TRUE 1
typedef struct {
int position;
int maxLength;
char* array;
} stack;
int match(char, char);
stack create();
void delete(stack*);
void push(stack*, char);
char pop(stack*);
int main() {
char c, out;
stack elemStack = create();
int escaped, inString, inChar, inComment, startComment, i, lineNum;
int returnValue;
escaped = inString = inChar = inComment = startComment = 0;
lineNum = 1;
while ((c = getchar()) != EOF) {
if (c == '\n')
lineNum++;
/* Test if in escaped state or for escape character */
if (escaped) {
escaped = FALSE;
}
else if (c == '\\') {
escaped = TRUE;
}
/* Test if currently in double/single quote or a comment */
else if (inString) {
if (c == '"' && !escaped) {
inString = FALSE;
}
}
else if (inChar) {
if (escaped)
escaped = FALSE;
else if (c == '\'' && !escaped) {
inChar = FALSE;
}
}
else if (inComment) {
if (c == '*')
startComment = TRUE;
else if (c == '/' && startComment)
inComment = FALSE;
else
startComment = FALSE;
}
/* Test if we should be starting a comment, quote, or escaped character */
else if (c == '*' && startComment)
inComment = TRUE;
else if (c == '/')
startComment = TRUE;
else if (c == '"') {
inString = TRUE;
}
else if (c == '\'') {
inChar = TRUE;
}
/* Accept the character and check braces on the stack */
else {
startComment = FALSE;
if (c == '(' || c == '[' || c == '{')
push(&elemStack, c);
else if (c == ')' || c == ']' || c == '}') {
out = pop(&elemStack);
if (out == -1 || !match(out, c)) {
printf("Syntax error on line %d: %c matched with %c\n.", lineNum, out, c);
return -1;
}
}
}
}
if (inString || inChar) {
printf("Syntax error: Quote not terminated by end of file.\n");
returnValue = -1;
}
else if (!elemStack.position) {
printf("Syntax check passed on %d line(s).\n", lineNum);
returnValue = 0;
}
else {
printf("Syntax error: Reached end of file with %d unmatched elements.\n ",
elemStack.position);
for(i = 0; i < elemStack.position; ++i)
printf(" %c", elemStack.array[i]);
printf("\n");
returnValue = -1;
}
delete(&elemStack);
return returnValue;
}
int match(char left, char right) {
return ((left == '{' && right == '}') ||
(left == '(' && right == ')') ||
(left == '[' && right == ']'));
}
stack create() {
stack newStack;
newStack.array = malloc(INITIALSTACK * sizeof(char));
newStack.maxLength = INITIALSTACK;
newStack.position = 0;
return newStack;
}
void delete(stack* stack) {
free(stack -> array);
stack -> array = NULL;
}
void push(stack* stack, char elem) {
if (stack -> position >= stack -> maxLength) {
char* newArray = malloc(2 * (stack -> maxLength) * sizeof(char));
int i;
for (i = 0; i < stack -> maxLength; ++i)
newArray[i] = stack -> array[i];
free(stack -> array);
stack -> array = newArray;
}
stack -> array[stack -> position] = elem;
(stack -> position)++;
}
char pop(stack* stack) {
if (!(stack -> position)) {
printf("Pop attempted on empty stack.\n");
return -1;
}
else {
(stack -> position)--;
return stack -> array[stack -> position];
}
}
Your solution is not that bad. It is very straight forward, which is a good thing. To learn a bit more from this excercise, I would probably implement this with a state machine. E.g. you have a few states like: code
, comment
, string
etc.. then you define transitions between them. It gets much easier because you end up with logic depending on the state (so you don't have a blob of code, like in your main function). After that you can parse your code depending on the state. This means for example: If you're in a comment state, you ignore everything until you encounter an ending comment character. Then you change the state to code
for example, and so forth.
In pseudo code it could look like this:
current_state = CODE
while(...) {
switch(current_state) {
case CODE:
if(input == COMMENT_START) {
current_state = COMMENT
break
}
if(input == STRING_START) {
current_state = STRING
break
}
// handle your {, [, ( stuff...
break
case COMMENT:
if(input == COMMENT_END) {
current_state = CODE
break
}
// handle comment.. i.e. ignore everything
break
case STRING:
// ... string stuff like above with state transitions..
break
}
}
Of course this can be done with e.g. yacc. But as I stated in a comment, I wouldn't suggest you use that. Maybe you could do that if you have enough time and want to learn as much as possible, but first I would implement it "the hard way".