c++initializationunassigned-variable

Is there a way to check a variable is already initialized in c++?


Let us say I'm initializing a vector vector<bool> V(n);. Is there a way I can know if V[n] is initialized or not? I need this for dynamic programming purposes. If the V[n] is initialized, I would utilize the value V[n] to obtain the result. If it's not initialized yet, I'd apply a function foo(.., n) or something to obtain the value of V[n]. I am asking this because I don't want to initialize a vector<int> V(n, -1); with 3 states like -1(for unassigned, or yet to find), 0(for false), and 1(for true). Instead, if there is a way to know if a variable V[n] is unassigned, I may be able to save some space for large values of n.


Solution

  • Just gathering all the comments into a readable answer.

    All the members of a vector that exist are intialised, so to solve the problem we really need to represent 3 states, Uninitialised, False, True, and create the entries as Uninitialised. We would want the vector to initially contain nodes in state Uninitialised.

    So how best to represent this tristate? Considerations: Code maintainability; speed of access; memory usage.

    vector<bool> is a special implementation of vector which /may/ be optimised to store more than 1 value per byte. It is possible to squeeze 8 bool bits into a byte. So a vector of 1000 bool will only use 125 bytes.

    If you create any other vector of data, it will store an object of the size of that data type, so char, for example, or more exactly a vector<int8_t>, would use 1 byte per entry. 1000 chars would use 1000 bytes.

    A vector<int> would use a number of bytes per entry, probably at least 4, so would cost 4000 bytes to hold 1000 elements.

    But you would only be using 3 of the possible 255 states in a char, so using a vector of char would be more efficient than a vector of int, but is still somewhat wasteful of storage vs the vector<bool>. You might not care about that, and that is a fair approach. The code generated by vector<bool> is more complex than for the normal vector, so your code would be slower..

    Let's go mad and use an enum:

      enum class State: int8_t
      {
        uninitialised = -1,
        False: 0,
        True: 1
      };
    
      std::vector<State> V(n,State::uninitialised);
    

    But what about vector<bool> ?

    The tighter forms suggested are to use 2 vectors of bool, one to say whether the entry is valid and the second to say that its value is set. This will cost 2*125 bytes, or 256 bytes for 1000 entries. That is still a saving over a vector of char.

    Or you could write your own wrapper for vector where you treat 2 consecutive entries as the valid and set flags, and you allocate it twice as large as you wanted. This has the advantage of locality of reference, and potentially the optimiser can somewhat merge consecutive questions "is it valid" then "is it set".

    So you save some storage, for the cost of some additional complexity (speed loss). You could wrap this in a class with accessors to hide the complexity.

    If you were going to do that, you could write your own wrapper round a vector<unit8_t> which divides the input index by 4 and splits the stored value into 4 2-bit tri-state values. This would possibly be slightly faster overall, as you would not separately ask the vector "is it valid" then "is it set".

    You /could/ squeeze more than 4 tristates into a byte - you can get 5, but that generates very slow code all round. The compiler knows how to divide by 4 very efficiently, and is less able to quickly divide by 5, or by powers of 3.

    These days we tend to choose speed and simplicity over space saving, so do the vector<bool> thing for fun if you like, but stick with the vector of char.

    That's all good.

    I guess the other question I have to ask, though, is under what conditions is an entry invalid? Are they made valid sequentially? If the number of valid entries an indication that the higher indices are not yet valid?

    In which case you could just start with an empty vector<bool> and push new values onto it as you need them - use index < size() to decide if the current index is valid or not? You can use reserve() to avoid the vector reallocating as it grows. This saves half the required storage, and keeps the code complexity manageable, so it is worth considering.

    Of course in your case initialised/validity may be a completely random state in which case this is not an option for you.