c++node.jssegmentation-faultrocksdbnode-native-addon

Why do I get segmentation faults for the subject line


I'm creating a native node extension for RocksDB, I've pinned down an issue which I can not explain. So I have the following perfectly functioning piece of code:

std::string v;

ROCKSDB_STATUS_THROWS(db->Get(*options, k, &v));

napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v.size(), v.c_str(), nullptr, &result));
return result;

But when I introduce an optimization that reduces one extra memcpy I get segfaults:

std::string *v = new std::string();

ROCKSDB_STATUS_THROWS(db->Get(*options, k, v)); // <============= I get segfaults here

napi_value result;
NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
return result;

Here's Get method signature:

rocksdb::Status rocksdb::DB::Get(const rocksdb::ReadOptions &options, const rocksdb::Slice &key, std::string *value)

Any thoughts why does this issue might happen?

Thank you in advance!

Edit

Just to be sure, I've also checked the following version (it also fails):

std::string *v = new std::string();

ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));

napi_value result;
NAPI_STATUS_THROWS(napi_create_buffer_copy(env, v->size(), v->c_str(), nullptr, &result));

delete v;

Edit

As per request in comments providing more complete example:

#include <napi-macros.h>
#include <node_api.h>

#include <rocksdb/db.h>
#include <rocksdb/convenience.h>
#include <rocksdb/write_batch.h>
#include <rocksdb/cache.h>
#include <rocksdb/filter_policy.h>
#include <rocksdb/cache.h>
#include <rocksdb/comparator.h>
#include <rocksdb/env.h>
#include <rocksdb/options.h>
#include <rocksdb/table.h>

#include "easylogging++.h"

INITIALIZE_EASYLOGGINGPP

...

/**
 * Runs when a rocksdb_get return value instance is garbage collected.
 */
static void rocksdb_get_finalize(napi_env env, void *data, void *hint)
{
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (started)";
    if (hint)
    {
        LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
        delete (std::string *)hint;
    }
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get_finalize (finished)";
}

/**
 * Gets key / value pair from a database.
 */
NAPI_METHOD(rocksdb_get)
{
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (started)";

    NAPI_ARGV(3);

    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting db argument)";
    rocksdb::DB *DECLARE_FROM_EXTERNAL_ARGUMENT(0, db);
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting k argument)";
    DECLARE_SLICE_FROM_BUFFER_ARGUMENT(1, k);
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting options argument)";
    rocksdb::ReadOptions *DECLARE_FROM_EXTERNAL_ARGUMENT(2, options);

    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (declaring v variable)";
    std::string *v = new std::string();

    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (getting value from database)";
    ROCKSDB_STATUS_THROWS(db->Get(*options, k, v));

    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (wrapping value with js wrapper)";
    napi_value result;
    NAPI_STATUS_THROWS(napi_create_external_buffer(env, v->size(), (void *)v->c_str(), rocksdb_get_finalize, v, &result));
    LOG_IF(logging_enabled, INFO) << LOCATION << " rocksdb_get (finished)";
    return result;
}

The code that launches the above method is implemented in TypeScript and runs in NodeJS, here is complete listing:

import path from 'path';
import { bindings as rocks, Unique, BatchContext } from 'rocksdb';

import { MapOf } from '../types';
import { Command, CommandOptions, CommandOptionDeclaration, Persist, CommandEnvironment } from '../command';

// tslint:disable-next-line: no-empty-interface
export interface PullCommandOptions {
}

@Command
export class ExampleCommandNameCommand implements Command {
    public get description(): string {
        return "[An example command description]";
    }

    public get options(): CommandOptions<CommandOptionDeclaration> {
        const result: MapOf<PullCommandOptions, CommandOptionDeclaration> = new Map();

        return result;
    }

    public async run(environment: CommandEnvironment, opts: CommandOptions<unknown>): Promise<void> {
        // let options = opts as unknown as PullCommandOptions;

        let window = global as any;

        window.rocks = rocks;

        const configPath = path.resolve('log.conf');
        const configPathBuffer = Buffer.from(configPath);

        rocks.logger_config(configPathBuffer);
        rocks.logger_start();

        let db = window.db = rocks.rocksdb_open(Buffer.from('test.db', 'utf-8'), rocks.rocksdb_options_init());
        let readOptions = window.readOptions = rocks.rocksdb_read_options_init();
        let writeOptions = window.writeOptions = rocks.rocksdb_write_options_init();

        // ===== The line below launches the C++ method
        rocks.rocksdb_put(db, Buffer.from('Zookie'), Buffer.from('Cookie'), writeOptions);
        // ===== The line above launches the C++ method

        console.log(rocks.rocksdb_get(db, Buffer.from('Zookie'), readOptions).toString());

        let batch: Unique<BatchContext> | null = rocks.rocksdb_batch_init();

        rocks.rocksdb_batch_put(batch, Buffer.from('Cookie'), Buffer.from('Zookie'));
        rocks.rocksdb_batch_put(batch, Buffer.from('Pookie'), Buffer.from('Zookie'));
        rocks.rocksdb_batch_put(batch, Buffer.from('Zookie'), Buffer.from('Zookie'));
        rocks.rocksdb_batch_put(batch, Buffer.from('Hookie'), Buffer.from('Zookie'));
        await rocks.rocksdb_batch_write_async(db, batch, writeOptions);

        batch = null;

        let proceed = true;

        while (proceed) {
            await new Promise(resolve => setTimeout(resolve, 1000));
        }
    }
}

Basically this code represents an implementation of KeyValueDatabase->Get("Some key") method, you pass a string into it you get a string in return. But it's obvious the issue is dancing around new std::string() call, I thought that I might get some explanations regarding why it's bad to go this way? How it is possible to move string value without a copy from one string into another?


Solution

  • But when I introduce an optimization that reduces one extra memcpy

    It's unclear which extra memcpy you think you are optimizing out.

    If the string is short, and you are using std::string with short-string optimization, then indeed you will optimize out a short memcpy. However, dynamically allocating and then deleting std::string is likely much more expensive than the memcpy.

    If the string is long, you don't actually optimize anything at all, and instead make the code slower for no reason.

    I get segfaults:

    The fact that adding v = new std::string; ... ; delete v; introduces a SIGSEGV is a likely indication that you have some other heap corruption going on, which remains unnoticed until you shift things a bit. Valgrind is your friend.