c++stringdictionarystxxl

How to use std::string as key in stxxl::map


I am trying to use std::string as a key in the stxxl::map The insertion was fine for small number of strings about 10-100. But while trying to insert large number of strings about 100000 in it, I am getting segmentation fault.

The code is as follows:

struct CompareGreaterString {
    bool operator () (const std::string& a, const std::string& b) const {
       return a > b;
    }
    static std::string max_value() {
       return "";
    } 
};

// template parameter <KeyType, DataType, CompareType, RawNodeSize, RawLeafSize, PDAllocStrategy (optional)>
typedef stxxl::map<std::string, unsigned int, CompareGreaterString, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> name_map;
name_map strMap((name_map::node_block_type::raw_size)*3, (name_map::leaf_block_type::raw_size)*3);
for (unsigned int i = 0; i < 1000000; i++) { /// Inserting 1 million strings
    std::stringstream strStream;
    strStream << (i);
    Console::println("Inserting: " + strStream.str());
    strMap[strStream.str()]=i;
}

In here I am unable to identify why I am unable to insert more number of strings. I am getting segmentation fault exactly while inserting "1377". Plus I am able to add any number of integers as key. I feel that the variable size of string might be causing this trouble.

Also I am unable to understand what to return for max_value of the string. I simply returned a blank string.


Solution

  • I have finally found the solution to my problem with great help from Timo bingmann, user2079303 and Martin Ba. Thank you.

    I would like to share it with you.

    Firstly stxxl supports POD only. That means it stores fixed sized structures only. Hence std::string cannot be a key. stxxl::map worked for about 100-1000 strings because they were contained in the physical memory itself. When more strings are inserted it has to write on disk which is internally causing some problems.

    Hence we need to use a fixed string using char[] as follows:

    static const int MAX_KEY_LEN = 16;
    
    class FixedString { 
    public:
        char charStr[MAX_KEY_LEN];
    
        bool operator< (const FixedString& fixedString) const {
            return std::lexicographical_compare(charStr, charStr+MAX_KEY_LEN,
                fixedString.charStr, fixedString.charStr+MAX_KEY_LEN);
        }
    
        bool operator==(const FixedString& fixedString) const {
            return std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
        }
    
        bool operator!=(const FixedString& fixedString) const {
            return !std::equal(charStr, charStr+MAX_KEY_LEN, fixedString.charStr);
        } 
    };
    
    struct comp_type : public std::less<FixedString> {
        static FixedString max_value()
        {
            FixedString s;
            std::fill(s.charStr, s.charStr+MAX_KEY_LEN, 0x7f);
            return s;
        } 
    };
    

    Please note that all the operators mainly((), ==, !=) need to be overriden for all the stxxl::map functions to work Now we may define fixed_name_map for map as follows:

    typedef stxxl::map<FixedString, unsigned int, comp_type, DATA_NODE_BLOCK_SIZE, DATA_LEAF_BLOCK_SIZE> fixed_name_map;
    fixed_name_map myFixedMap((fixed_name_map::node_block_type::raw_size)*5, (fixed_name_map::leaf_block_type::raw_size)*5);
    

    Now the program is compiling fine and is accepting about 10^8 strings without any problem. also we can use myFixedMap like std::map itself. {for ex: myFixedMap[fixedString] = 10}