I'm creating a native node extension for RocksDB, I've pinned down an issue which I can not explain. So I have the following perfectly functioning piece of code:
std::string v;
ROCKSDB_STATUS_THROWS(db->Get(*options, k, &v));
napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v.size(), v.c_str(), nullptr, &result));
return result;
But when I introduce an optimization that reduces one extra memcpy
I get segfaults:
std::string *v = new std::string();
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v)); // <============= I get segfaults here
napi_value result;
NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
return result;
Here's Get
method signature:
rocksdb::Status rocksdb::DB::Get(const rocksdb::ReadOptions &options, const rocksdb::Slice &key, std::string *value)
Any thoughts why does this issue might happen?
Thank you in advance!
Edit
Just to be sure, I've also checked the following version (it also fails):
std::string *v = new std::string();
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));
napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v->size(), v->c_str(), nullptr, &result));
delete v;
Edit
As per request in comments providing more complete example:
#include <napi-macros.h>
#include <node_api.h>
#include <rocksdb/db.h>
#include <rocksdb/convenience.h>
#include <rocksdb/write_batch.h>
#include <rocksdb/cache.h>
#include <rocksdb/filter_policy.h>
#include <rocksdb/cache.h>
#include <rocksdb/comparator.h>
#include <rocksdb/env.h>
#include <rocksdb/options.h>
#include <rocksdb/table.h>
#include "easylogging++.h"
INITIALIZE_EASYLOGGINGPP
...
/**
* Runs when a rocksdb_get return value instance is garbage collected.
*/
static void rocksdb_get_finalize(napi_env env, void *data, void *hint)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (started)";
if (hint)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
delete (std::string *)hint;
}
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
}
/**
* Gets key / value pair from a database.
*/
NAPI_METHOD(rocksdb_get)
{
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (started)";
NAPI_ARGV(3);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting db argument)";
rocksdb::DB *DECLARE_FROM_EXTERNAL_ARGUMENT(0, db);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting k argument)";
DECLARE_SLICE_FROM_BUFFER_ARGUMENT(1, k);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting options argument)";
rocksdb::ReadOptions *DECLARE_FROM_EXTERNAL_ARGUMENT(2, options);
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (declaring v variable)";
std::string *v = new std::string();
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting value from database)";
ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (wrapping value with js wrapper)";
napi_value result;
NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (finished)";
return result;
}
The code that launches the above method is implemented in TypeScript and runs in NodeJS, here is complete listing:
import path from 'path';
import { bindings as rocks, Unique, BatchContext } from 'rocksdb';
import { MapOf } from '../types';
import { Command, CommandOptions, CommandOptionDeclaration, Persist, CommandEnvironment } from '../command';
// tslint:disable-next-line: no-empty-interface
export interface PullCommandOptions {
}
@Command
export class ExampleCommandNameCommand implements Command {
public get description(): string {
return "[An example command description]";
}
public get options(): CommandOptions<CommandOptionDeclaration> {
const result: MapOf<PullCommandOptions, CommandOptionDeclaration> = new Map();
return result;
}
public async run(environment: CommandEnvironment, opts: CommandOptions<unknown>): Promise<void> {
// let options = opts as unknown as PullCommandOptions;
let window = global as any;
window.rocks = rocks;
const configPath = path.resolve('log.conf');
const configPathBuffer = Buffer.from(configPath);
rocks.logger_config(configPathBuffer);
rocks.logger_start();
let db = window.db = rocks.rocksdb_open(Buffer.from('test.db', 'utf-8'), rocks.rocksdb_options_init());
let readOptions = window.readOptions = rocks.rocksdb_read_options_init();
let writeOptions = window.writeOptions = rocks.rocksdb_write_options_init();
// ===== The line below launches the C++ method
rocks.rocksdb_put(db, Buffer.from('Zookie'), Buffer.from('Cookie'), writeOptions);
// ===== The line above launches the C++ method
console.log(rocks.rocksdb_get(db, Buffer.from('Zookie'), readOptions).toString());
let batch: Unique<BatchContext> | null = rocks.rocksdb_batch_init();
rocks.rocksdb_batch_put(batch, Buffer.from('Cookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Pookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Zookie'), Buffer.from('Zookie'));
rocks.rocksdb_batch_put(batch, Buffer.from('Hookie'), Buffer.from('Zookie'));
await rocks.rocksdb_batch_write_async(db, batch, writeOptions);
batch = null;
let proceed = true;
while (proceed) {
await new Promise(resolve => setTimeout(resolve, 1000));
}
}
}
Basically this code represents an implementation of KeyValueDatabase->Get("Some key") method, you pass a string into it you get a string in return. But it's obvious the issue is dancing around new std::string()
call, I thought that I might get some explanations regarding why it's bad to go this way? How it is possible to move string value without a copy from one string into another?
But when I introduce an optimization that reduces one extra memcpy
It's unclear which extra memcpy
you think you are optimizing out.
If the string is short, and you are using std::string
with short-string optimization, then indeed you will optimize out a short memcpy
. However, dynamically allocating and then deleting std::string
is likely much more expensive than the memcpy
.
If the string is long, you don't actually optimize anything at all, and instead make the code slower for no reason.
I get segfaults:
The fact that adding v = new std::string; ... ; delete v;
introduces a SIGSEGV
is a likely indication that you have some other heap corruption going on, which remains unnoticed until you shift things a bit. Valgrind is your friend.