After successfully building the R* tree with spatial library inserting records one-by-one 2.5 million of times, I was trying to create the R* tree with bulkloading. I implemented the DBStream class to iteratively give the data to the BulkLoader. Essentially, it invokes the following method and prepared a Data (d variable in the code) object for the Bulkloader:
void DBStream::retrieveTuple() {
if (query.next()) {
hasNextBool = true;
int gid = query.value(0).toInt();
// allocate memory for bounding box
// this streets[gid].first returns bbox[4]
double* bbox = streets[gid].first;
// filling the bounding box values
bbox[0] = query.value(1).toDouble();
bbox[1] = query.value(2).toDouble();
bbox[2] = query.value(3).toDouble();
bbox[3] = query.value(4).toDouble();
rowId++;
r = new SpatialIndex::Region();
d = new SpatialIndex::RTree::Data((size_t) 0, (byte*) 0, *r, gid);
r->m_dimension = 2;
d->m_pData = 0;
d->m_dataLength = 0;
r->m_pLow = bbox;
r->m_pHigh = bbox + 2;
d->m_id = gid;
} else {
d = 0;
hasNextBool = false;
cout << "stream is finished d:" << d << endl;
}
}
I initialize the DBStream object and invoke the bulk loading in the following way:
// creating a main memory RTree
memStorage = StorageManager::createNewMemoryStorageManager();
size_t capacity = 1000;
bool bWriteThrough = false;
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*memStorage, capacity, bWriteThrough);
double fillFactor = 0.7;
size_t indexCapacity = 100;
size_t leafCapacity = 100;
size_t dimension = 2;
RTree::RTreeVariant rv = RTree::RV_RSTAR;
DBStream dstream();
tree = RTree::createAndBulkLoadNewRTree(SpatialIndex::RTree::BLM_STR, dstream,
*fileInMem,
fillFactor, indexCapacity,
leafCapacity, dimension, rv, indexIdentifier);
cout << "BulkLoading done" << endl;
Bulk loading calls my next() and hasNext() functions, retrieved my data, sorts it and then seg faults in the building phase. Any clues way? Yeah, the error is:
RTree::BulkLoader: Building level 0
terminate called after throwing an instance of 'Tools::IllegalArgumentException'
The problem supposedly lies in the memory allocation and a few bugs in the code (somewhat related to memory allocation too). Firstly one needs to properly assign the properties of the Data variable:
memcpy(data->m_region.m_pLow, bbox, 2 * sizeof(double));
memcpy(data->m_region.m_pHigh, bbox + 2, 2 * sizeof(double));
data->m_id = gid;
Second (and most importantly) getNext must return a new object with all the values:
RTree::Data *p = new RTree::Data(returnData->m_dataLength, returnData->m_pData,
returnData->m_region, returnData->m_id);
return returnData;
de-allocation of memory is done by RTree so no care is needed to be taken here.