I've been learning C for the past couple months. I work on biological problems for which variables involving a binary state of present/missing are very common. For these situations I have been catching myself often using pointers, even for simpler variables, to get the option of initializing the pointer as NULL and allocating when setting a value. This then later on allows me to check if the value exists or not, with a consistent manner of checking that does not depend on the type in question. Now, as I create increasingly complex/nested structures, this means I have pointers to structs that contain pointers to structs and so on.
Essentially, this poses three problems in my mind. For one, I'm often doing unnecessary heap allocations when I can just use the stack instead. Secondly, my structures contain a lot of redirections (am I saying that right?), where essentially I'm hopping to a lot of different addresses in the heap (maybe in a worst case scenario even outside of the current cache line?). Thirdly, the heap allocations require later free's, which very often need separately coded destroy_foo() functions for even relatively simple structs which would not be necessary otherwise.
In my mind these have so far been the lesser evil, as I cannot think of a good/consistent way to check if a variable should or should not be used in logic. Two ways that I have seen before are just the simple cases of a "null" value, e.g., int foo = -1;
or just creating a separate variable that keeps track of whether the first is initialized, e.g. int is_foo_initialized = 0;
. The first seems to require more intimate knowledge of the type used, and is prone to creating difficult bugs when used wrong/inconsistently for more complex situations. I have been leaning towards the boolean option as of late, even if that means a lot of variables of 'is_initialized' in my structs, but using it often feels like I am expanding a lot of effort to keep track of it, e.g., set it on initialization and assignment, etc., for what in other languages I is a 'simple' check, although I do realize this may not hold for C.
In the end, essentially a very simple problem, but I am not liking my current methods and I am wondering if I am missing something or am just plain wrong in how I look at this issue. Those of you more experienced in C, how do you implement null checks, what considerations do you make, and how do you safeguard yourself against obscure logic bugs?
Thanks for reading. I apologize if the situation in question is too broad for a clear answer, or if the null checking methods and considerations are simply the reality of the situation. I asked the question to check my own ignorance.
P.S. My question is not answered by Initializing variables in C because I am not asking about reasons for initializing variables, when not to do so, and what issues may arise if I don't. I ask about checking for (un)initialized/null variables, which is not covered in said question or its answers.
Where to allocate memory depends a lot on what system you are coding for. On hosted systems the general best practice is to allocate large objects on the heap. Some graphic libs etc just allocate everything on the heap like you do in order to be consistent. On freestanding (embedded) systems however, heap allocation should be avoided entirely. I'll assume a hosted system.
It isn't really meaningful to place local variables and the like on the heap - that will build up a whole lot of overhead code. Also data which is read-only should get allocated by the compiler where it thinks is best, likely some .rodata
section in read-only memory.
I cannot think of a good/consistent way to check if a variable should or should not be used in logic.
It is usually not a good idea to burden the data with artificial error codes like -1
etc so I can see where you are coming from. And quite often a variable can hold any possible value within its range so there's no room for error codes. Null pointers solve that, so they could be a good solution in many cases.
The easiest solution is to simply set a separate variable to keep track of it.
bool is_foo_initialized
.
This only makes sense for a group of variables though, such as all variables belonging to one struct or library. To have a bool flag for every single variable will get messy quick.
Similarly, professional libs always reserve the return value used by public API functions for a result. They avoiding mixing function outcome with data. Ideally they don't use bool
either because it's too crude - "fail because reason: bad" is usually not enough info... And so professional libs tend to always use an enum or similar type to hold result codes and pass data through pointer parameters.
Another thing which some might find unconventional is simply to not initialize data but ensure that the first access of it is a write access. I do this all the time even in mission-critical real-time systems. Simply because setting everything to zero takes overhead time and is useless when you are certain that you assign to the variable before using it anyway.
It requires a certain programmer mind set however. If I declare a variable foo
then at the first time I use that variable in my code, I automatically start thinking "where do I set this one up". And write the code which does set up the variable in case that part was missing. To just grab any variable in your code with no concern over what value it might have at the point when you start using it - that's just sloppy. Similarly you always need to consider the type and value range of the variable you are using.