I'm implementing a Skip List. It's not important what it is, but it works right now for 1000 nodes but not with 10000. I was getting SegFaults that didn't made sense, so I printf'ed some variables. To my surprise, a lot of things that shouldn't were changing, to garbage values. For example, I printed inputValue before and after function insertNode. It sometimes resets to zero, when should always be incrementing. Let's see code (skip the read file input, the problem happens at the while cycle):
int main(int argc, char** argv) {
string filename = "";
if( argc == 2 )
filename = argv[1];
else
return 0;
list = new skiplist();
fstream inputFile(filename.c_str(), ios_base::in);
inputFile >> numberofnodes;
inputFile >> list->minimumKey;
inputFile >> list->maximumKey;
printf("%d\n", numberofnodes);
printf("%d\n", list->minimumKey);
printf("%d\n", list->maximumKey);
list->Maxlevel = 1;
list->header = new node();
list->tail = new node();
list->header->key = list->minimumKey;
list->tail->key = list->maximumKey;
for ( int i=1; i<=MAXIMUMLEVEL; i++ ) {
list->header->forward[i] = list->tail;
list->tail->forward[i] = NULL;
}
int sanityCheck = 134153;
// insert nodes
int inputKey;
int inputValue = 0;
int * keys = new int[numberofnodes];
while (inputFile >> inputKey)
{
inputValue++;
keys[inputValue] = inputKey;
insertNode(inputKey, inputValue);
if(sanityCheck != 134153) // dark magic changes this value
keys[9999999999999999999999]++; // program crashes here
// it would otherwise crash on while
}
printf("\n\nNodes inserted: %d\n\n",inputValue);
I ran Valgrind. The invalid memory writes/read happened after and because of the variables changing, at least I believe so. That's why I added the sanity check. And as I thought, there were no invalid memory writes/read before trying to access keys[9999999999999999999999]. But that line can only run the int sanitycheck is changed, which I never do.
Finally, here's the code for the insertNode. I see nothing on it that could cause this:
void insertNode(int newKey, int newValue){
node * update[MAXIMUMLEVEL];
node * auxNode = list->header;
for(int i=list->Maxlevel; i >=1; i--) {
while ( auxNode->forward[i]->key < newKey ) {
auxNode = auxNode->forward[i];
}
update[i] = auxNode;
}
auxNode = auxNode->forward[1];
if ( auxNode->key == newKey ) {
auxNode->value = newValue;
} else {
int randomLevel = 1;
while ( (rand() / double(RAND_MAX)) < LEVELPROBABILITY && randomLevel < MAXIMUMLEVEL ) {
randomLevel++;
}
if ( randomLevel > list->Maxlevel ) {
for ( int i = list->Maxlevel+1; i <= randomLevel; i++ ) {
update[i] = list->header;
}
list->Maxlevel = randomLevel;
}
node * newNode = new node();
newNode->key = newKey;
newNode->value = newValue;
for ( int i=1; i<=MAXIMUMLEVEL; i++ ) {
newNode->forward[i] = NULL;
}
for ( int i=1; i<=list->Maxlevel; i++ ) {
newNode->forward[i] = update[i]->forward[i];
update[i]->forward[i] = newNode;
}
}
}
And the structures:
typedef struct node {
int key;
int value;
node * forward[MAXIMUMLEVEL+1];
}node;
struct skiplist {
int minimumKey;
int maximumKey;
int Maxlevel;
node * header;
node * tail;
};
EDIT:
#define MAXIMUMLEVEL 16
#define LEVELPROBABILITY 0.5
I'm not even using mallocs. There are pointer operations, but valgrind should detect if I did something bad right? If I was running out of memory, there would be an exception. How is it possible that an int I create and never access/write/change gets modified? Sorry for the long post, but I have no idea where the problem might be.
Valgrind output without the sanity check (keys[999...9]): http://pastebin.com/hWH3fri2
Line 155 is the while (inputFile >> inputKey)
Here's the output of clang's address sanitizer (after setting it up properly):
==15146==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffeb006bb80 at pc 0x0000004e093c bp 0x7ffeb006ba60 sp 0x7ffeb006ba58 WRITE of size 8 at 0x7ffeb006bb80 thread T0 #0 0x4e093b in insertNode(int, int) skiplist.cpp:55:27 #1 0x4e3385 in skiplist.cpp:160:9 #2 0x7f40b2fcda3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20a3f) #3 0x419508 in _start (a.out+0x419508) Address 0x7ffeb006bb80 is located in stack of thread T0 at offset 160 in frame #0 0x4e022f in insertNode(int, int) skiplist.cpp:35 This frams has 1 object(s): [32, 160) 'update' <== Memory access at offset 160 overflows this variable
Line 55 refers to this:
void insertNode(int newKey, int newValue){
node * update[MAXIMUMLEVEL];
node * auxNode = list->header;
for(int i=list->Maxlevel; i >=1; i--) {
while ( auxNode->forward[i]->key < newKey ) {
auxNode = auxNode->forward[i];
}
update[i] = auxNode;
}
auxNode = auxNode->forward[1];
if ( auxNode->key == newKey ) {
auxNode->value = newValue;
} else {
int randomLevel = 1;
while ( (rand() / double(RAND_MAX)) < LEVELPROBABILITY && randomLevel < MAXIMUMLEVEL ) {
randomLevel++;
}
if ( randomLevel > list->Maxlevel ) {
for ( int i = list->Maxlevel+1; i <= randomLevel; i++ ) {
update[i] = list->header; // line 55 <===================
}
list->Maxlevel = randomLevel;
}
The loop
while ( (rand() / double(RAND_MAX)) < LEVELPROBABILITY && randomLevel < MAXIMUMLEVEL ) {
randomLevel++;
}
guarantees that randomLevel <= MAXIMUMLEVEL
. If randomLevel == MAXIMUMLEVEL
, and MAXIMUMLEVEL > list->Maxlevel
, then the loop in line 54 turns into:
for ( int i = list->Maxlevel+1; i <= MAXIMUMLEVEL; i++ ) {
update[i] = list->header; // line 55 <===================
}
Note that update
is declared as node * update[MAXIMUMLEVEL];
. You'll get an out-of-bounds access.
I don't quite understand why your code seems not to access the 0th element of arrays. In my experience, it is also much easier to work with half-open-on-right ranges of the form [0, length_of_array)
which leads to loops of the form
for(int i = 0; i < length_of_array; ++i)
Note the <
instead of <=
. Consistent use of half-open-on-right ranges can dramatically reduce the number of off-by-one errors.
A quick fix is to declare update
just like node::forward
as
node * update[MAXIMUMLEVEL + 1];
Note the +1
.
A better fix is probably to rewrite the code such that it uses half-open-on-right ranges, where MAXIMUMLEVEL
gets it's interpretation from the range [0, MAXIMUMLEVEL)
and is no longer a maximum, but a supremum (and denotes the number of levels).