I was trying to write some code in C to replace a substring in a given string following this tutorial. I managed to get it working, however one of the problems with the function is that it will result in a buffer overflow if the string we are replacing with is bigger than the original substring. In my program, I know that I will only pass dynamically allocated strings to this function and so I thought that I could check if the string we are going to replace with is bigger than the substring and if it is I could use the realloc
function to resize the original string to make space for the string we are going to replace with. Initally, the function seemed to work but for some reason when I call it repeatedly to replace different substrings on the same string, the string becomes corrupted.
In the code below I have a function called GetFileContents
which will retrieve the file contents of the file called first_page.html
into a dynamically allocated string and then return said string, this function works correctly and does not have any issues. The StrReplaceSubstringFirstOccurance
will replace the substring I specify with a new string and it will also print out the output of this and prefix the output with Inside:
. In the if statement where I am checking if the string I am replacing with is greater than the substring, I have 2 different approaches to resizing the string. Approach 2 uses realloc
and Approach 1 involves allocating a new string using malloc
, copying the contents of the original string to the new string, calling free
on the original string and then making the pointer to the original string point to the new string. Both approaches produce the same problem. As you can see in the output, the first two calls to StrReplaceSubstringFirstOccurance
work successfully, however the third call to the function (substring is [[to]]
and the string to replace with is NOT CUSTOM!
), inside of the function the function prints out the correct string, however when I try to print out the string outside of the function it shows garbage characters. In the fourth call to the function, the function recognizes that there are no more occurrences of the substring in the original string (since the original string now contains garbage characters) and so it simply exits the function and the garbage characters in the original string still remain there.
Code:
static bool StrReplaceSubstringFirstOccurance(char* source, char* substring, char* replace) {
char* substring_occurance = strstr(source, substring);
if (substring_occurance == NULL) {
printf("No substring: %s found.\n", substring);
return false;
}
if (strlen(replace) > strlen(substring)) {
size_t new_size = strlen(source) + (strlen(replace)-strlen(substring))+1;
// Approach 1
// char* temp = malloc(new_size);
// memcpy(temp, source, strlen(source)+1);
// free(source);
// source = temp;
// Approach 2
source = realloc(source, new_size);
}
substring_occurance = strstr(source, substring);
memmove(substring_occurance + strlen(replace),
substring_occurance + strlen(substring),
strlen(substring_occurance) - strlen(substring)+1);
memcpy(substring_occurance, replace, strlen(replace));
printf("\nInside: %s\n", source);
return true;
}
int main(void) {
char* first_page = GetFileContents("first_page.html");
StrReplaceSubstringFirstOccurance(first_page, "[[say]]", "CUSTOM!");
StrReplaceSubstringFirstOccurance(first_page, "[[say]]", "CUSTOM!");
printf("\nFirst Print: %s\n", first_page);
StrReplaceSubstringFirstOccurance(first_page, "[[to]]", "NOT CUSTOM!");
printf("\nSecond Print: %s\n", first_page);
StrReplaceSubstringFirstOccurance(first_page, "[[to]]", "NOT CUSTOM!");
printf("\nThird Print: %s\n", first_page);
printf("Reached the end!\n");
return 0;
}
Output:
Inside: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! [[to]]</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
</body>
</html>
Inside: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! [[to]]</title>
<body>
CUSTOM!
<h1>This is a header!</h1>
[[to]]
</body>
</html>
First Print: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! [[to]]</title>
<body>
CUSTOM!
<h1>This is a header!</h1>
[[to]]
</body>
</html>
Inside: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! NOT CUSTOM!</title>
<body>
CUSTOM!
<h1>This is a header!</h1>
[[to]]
</body>
</html>
Second Print: α#F^Ñ☻
No substring: [[to]] found.
Third Print: α#F^Ñ☻
Reached the end!
first_page.html
<!DOCTYPE html>
<html lang="en-US">
<title>[[say]] [[to]]</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
</body>
</html>
I managed to fix this by following pmg's comments. I passed a pointer to the file_path
variable in the StrReplaceSubstringFirstOccurance
function.
The new code looks like this:
static bool StrReplaceSubstringFirstOccurance(char** source, char* substring, char* replace) {
char* substring_occurance = strstr(*source, substring);
if (substring_occurance == NULL) {
printf("No substring: %s found.\n", substring);
return false;
}
if (strlen(replace) > strlen(substring)) {
size_t new_size = strlen(*source) + (strlen(replace)-strlen(substring))+1;
*source = realloc(*source, new_size);
}
substring_occurance = strstr(*source, substring);
memmove(substring_occurance + strlen(replace),
substring_occurance + strlen(substring),
strlen(substring_occurance) - strlen(substring)+1);
memcpy(substring_occurance, replace, strlen(replace));
printf("\nInside: %s\n", *source);
return true;
}
int main(void) {
char* first_page = GetFileContents("first_page.html");
StrReplaceSubstringFirstOccurance(&first_page, "[[say]]", "CUSTOM!");
printf("\nFirst Print: %s\n", first_page);
StrReplaceSubstringFirstOccurance(&first_page, "[[to]]", "NOT CUSTOM!");
printf("\nThird Print: %s\n", first_page);
printf("Reached the end!\n");
return 0;
}
And the output is:
Inside: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! [[to]]</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
[[say]][[to]]
</body>
</html>
First Print: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! [[to]]</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
[[say]][[to]]
</body>
</html>
Inside: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! NOT CUSTOM!</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
[[say]][[to]]
</body>
</html>
Third Print: <!DOCTYPE html>
<html lang="en-US">
<title>CUSTOM! NOT CUSTOM!</title>
<body>
[[say]]
<h1>This is a header!</h1>
[[to]]
[[say]][[to]]
</body>
</html>
Reached the end!