cpointersfunction-pointersvoid-pointers

Disadvantages of using void * to hold generic objects?


I'm a C beginner, and I've got a question while I read the Chapter 19.5 of C Programming: A Modern Approach (2nd Edition) by K. N. King.

In this chapter, the author explains what is program design and demonstrates the related concepts about it by developing an implementation of the stack data structure.

The below code is the part of the implementation:

typedef int Item;

void push(Stack s, Item i);

Since the Item type has been defined as int, currently the stack is not able to have items of different types except int.

Because of this flaw, the author introduces a way to make this stack generic:

void push(Stack s, void *p);

The way is the use of void *. As far as I understand, This allows the stack to have arbitrary pointers, i.e. various types.

After the introduction of void *, the author mentions the following which I don't understand:

There are two disadvantages to using void * as the item type. One is that this approach doesn't work for data that can't be represented in pointer form. Items could be strings (which are represented by a pointer to the first character in the string) or dynamically allocated structures but not basic types such as int and double.

p. 503, Chapter 19.5 Design Issues for Abstract Data Types

Question 1: What data can't be represented in pointer form? I don't think there is such data. The reason I think so is, int * is a pointer to an int object, struct foo * is a pointer to a struct foo object, and int (*)(int, char **) is the pointer to the well-known main function.

Question 2: Actually, I don't guess the intent of the author giving strings and dynamically allocated structures as examples of such data. As far as I'm concerned, I agree that it would be problematic if a string were not ended with \0, since I need to know the length of the string in order to give it a proper manipulation. Meanwhile, if only a string were null-terminated, wouldn't it be okay to push the string, i.e. char * to the stack? What makes me more confusing is the case of dynamically allocated structures. I think that a dynamically allocated structure is just an object like any other types. So, if only I remembered what objects I've pushed to the stack, wouldn't it be okay to pop an item from the stack and to type-cast it to the proper type in order to dereference it? Of course, it is needless to say that remembering types of pushed items is not confined to dynamically allocated structures, since the stack takes void *. For instance, I think that if I pushed a int object to the stack, type-casting is necessary to dereference it: value = *((int *) pop(stack)) .

Question 3: This question is not (directly?) related to this post. However, I think it is a nice timing to ask, so I ask: In the Chapter 17, the author says:

In general, we can assign a void * value to a variable of any pointer type and vice versa.

p. 416, Chapter 17.2

However, I read some sentences from the Suggested Improvements section in the website of this book:

Actually, this is only true for pointers to object/data types; pointers to functions cannot reliably be converted to void pointers or vice versa in ISO C (they can in POSIX).

...

dereferencing a void pointer is only valid in a void context and essentially useless...

Rob Gamble

Since I'm not only a C beginner as I've mentioned early, I'm also a computer beginner. From this reason, I'm curious: why function pointers can't (reliably?) be converted to void pointers and vice versa? (I searched for it and found a post addressing this question, but the answers are, I'm afraid, somewhat difficult for me to understand) Also, I don't understand what the meaning of void context.

Thank you for taking your golden time to read this question. It would be greatly appreciated if you should give me some enlightenment.


Solution

  • Question 1: What data can't be represented in pointer form? I don't think there is such data.

    If the text means "data you can't point at", then I don't think so either. A void* can either be converted to any other object pointer or alternatively get set to point to the address of any pointer object.

    The only thing you can't do is to assign a function pointer to a void*.

    In case the author means that we should store the representation of objects inside a void* then that's a horrible idea. It is only possible to do that for integer types of smaller or equal size as the void* anyway, so it is far from generic. And even then there's restrictions with alignment and so on.


    Question 2:

    if only a string were null-terminated, wouldn't it be okay to push the string, i.e. char * to the stack?

    That's fine.

    I think that a dynamically allocated structure is just an object like any other types

    There's no difference regarding where the data is allocated. Either the stack makes a copy of it or otherwise keep a pointer to it.

    It should be mentioned that best practices when writing any form of "abstact data type" (ADT) such as a stack, is to always make a copy of the data to be handled and allocated internally by the stack. And not to store pointers to data allocated elsewhere, as that will be bug-prone and hard to maintain.


    Question 3

    I'm curious: why function pointers can't (reliably?) be converted to void pointers and vice versa?

    Because such behavior isn't defined by the C standard. The chapter 6.3.2.3 either explicitly speaks of object pointers, as it does in every scenario where the behavior of void* is defined, or it explicitly speaks of function pointers in a section of its own.

    Notably, the parts regarding conversions between integers to/from pointers or null pointers to/from pointers apply to both object pointers and function pointers.

    Furthermore most conversions sort under the rules of assignment 6.5.16.1, where the rule regarding void pointers is as follows (emphasis mine):

    • the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) one operand is a pointer to an object type, and the other is a pointer to a qualified or unqualified version of void, and the type pointed to by the left has all the qualifiers of the type pointed to by the right;

    The above is a formal constraint, meaning that the C compiler must implement it strictly and give the programmer a diagnostic message in case the code does not conform to C language constraints. A compiler which fails to give a diagnostic message is not conforming to the C standard.

    Regarding hardware:

    The C standard aside, CPUs come in two flavours: Harvard architecture, which (simply put) accesses data and code on separate buses, and von Neumann architechture, which (simply put) can access either data or code memory through the same bus. On Harvard architectures in particular it might be deeply problematic to assign data pointers to function pointers or vice versa, because the CPU can't handle that without using diverse tricks.

    But also on any CPU architecture which uses a somewhat advanced memory mapping unit (MMU), it may keep track on what parts of the memory that is data and which parts are code, and then throw a hardware exception if you attempt to run code from data memory or use code memory for data storage. Such exceptions might be thrown as early as at the point of conversion and not necessarily just when you de-reference an object pointer or call a function pointer.

    And finally the data address format is not necessarily compatible with the code address format. It is not uncommon to have systems where data is stored in one memory area, but (parts of) the code are stored in an expanded memory area beyond the normal bus range. For example on some system data pointers could be guaranteed to always be 16 bits but function pointers could be up to 24 bits. Stuff like this is very common in low end 8/16 bit microcontrollers.

    Data and code might also have different alignment requirements. As in a CPU may be able to read data which is either 16 or 32 bit aligned, but it requires all functions to be 32 bit aligned in terms of start address.