cbit-manipulationbytebitmaskbyte-shifting

C function replace a byte in a specific index of a parameter


So I'm using the following code:

unsigned long replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
{
    int shift = 8 * indexToReplace;
        unsigned long value = newByte << shift;
        unsigned long mask = 0xff << shift;

        return (~mask & original) | value;
}

I have a given word with |w| bytes.

For example:

replaceByte(unsigned long original, unsigned char newByte, int indexToReplace)
correct answer:
replaceByte(0x12345678CDEF3456, 0xAB, 2) --> 0x1234AB78CDEF3456
                       (my code's output is: 0x12345678CDAB3456)
correct answer:
replaceByte(0x12345678CDEF3456, 0xAB, 0) --> 0xAB345678CDEF3456
                       (my code's output is: 0x12345678cdef34AB)

I thought I need to check whether the system is a big endian or a little endian because my code changes the exact opposite bytes. Lets say for example it changes the MSB instead of the LSB. But... I realized that it does not matter because i'm working with bits.

As you can see, the code here changes the wrong index:

(!) Error in index: 0. Output: 0x123456789abcdeff
Answer: 0xff3456789abcdeab

 (!) Error in index: 1. Output: 0x123456789abcffab
Answer: 0x12ff56789abcdeab

(!) Error in index: 2. Output: 0x123456789affdeab
Answer: 0x1234ff789abcdeab

 (!) Error in index: 3. Output: 0xffffffffffbcdeab
Answer: 0x123456ff9abcdeab

 (!) Error in index: 4. Output: 0x123456789abcdeff
 Answer: 0x12345678ffbcdeab

(!) Error in index: 5. Output: 0x123456789abcffab
 Answer: 0x123456789affdeab

 (!) Error in index: 6. Output: 0x123456789affdeab
 Answer: 0x123456789abcffab

Well, I thought about changing my code to something with arrays, just to get a number -> run on it as an array -> change the needed index -> and that's it. But.. I couldn't write it correctly so I stick to the shifting thing (which I can't write correctly as well). This is my attempt:

    unsigned long replaceByte(unsigned long original, unsigned char newByte, int indexToReplace){
    int size = (sizeof(unsigned long));
char a[size];
for (int i=0; i<size; i++){
if (i=0)
a[0] = original & 0xff;
else
a[i] = original>>(8*i) & 0xff;
}
a[indexToReplace] = newByte;
......// stuck
 }

I'm not allowed to use long long, uint_fast64_t or reinterpret_cast or any other "externals" things.

I also think I need to change somehow if the code is running on a 32 bit system or a 64 one in order to determine which size is the unsigned long (4 or 8 bytes).


Solution

  • This is prefaced by [my] top comments.

    value and mask need to be unsigned long.

    Also, when doing the shift, both values are/were getting truncated [to 32 bit] due to expression promotion rules.

    In the above, I forgot about value having the same problem.


    Here's an alternate way to force correct shifting:

    unsigned long
    replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
    {
        int shift = indexToReplace*8;
        unsigned long value = newByte;
        unsigned long mask = 0xff;
    
        value <<= shift;
        mask <<= shift;
    
        return (~mask & original) | value;
    }
    

    The above is what I usually do. But, the following may also work:

    unsigned long
    replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
    {
        int shift = indexToReplace*8;
        unsigned long value = ((unsigned long) newByte) >> shift;
        unsigned long mask = ((unsigned long) 0xff) >> shift;
    
        return (~mask & original) | value;
    }
    

    UPDATE:

    hey thanks. The provided codes bring me the following output: 0x12345678cdef34AB instead of 0xAB345678CDEF3456. I'm pretty sure it's related to the little endian thing because it's not a coincidence that instead of the MSB the LSB gets replaced.

    It's not an endian thing. It's how indexToReplace needs to be interpreted.

    The processor fetches according to the endian mode in effect, so by the time we try to do the shift, the value in the processor register is always big endian [so, no worries]

    The normal/usual is that the index starts from the right. But, according to the [correct] data, the problem wants the index to be from the left.

    So, we just need to adjust the index/shift:

    unsigned long
    replaceByte(unsigned long original,unsigned char newByte,int indexToReplace)
    {
    #if 0
        int shift = indexToReplace * 8;
    #else
        int shift = ((sizeof(unsigned long) - 1) - indexToReplace) * 8;
    #endif
        unsigned long value = newByte;
        unsigned long mask = 0xff;
    
        value <<= shift;
        mask <<= shift;
    
        return (~mask & original) | value;
    }
    

    UPDATE #2:

    It recognises the "int shift = indexToReplace * 8;" as a comment for some reason, but it still works.

    That is because #if 0 is a CPP [preprocessor] statement. It is interpreted in a manner similar to #ifdef NEVERWAS where we never do a #define NEVERWAS, so the code under the #else is what is included.

    You may wish to use the -E and/or -P options when compiling to see the output of the preprocessor stage.

    In this instance, the only thing that the compiler will see is:

    int shift = ((sizeof(unsigned long) - 1) - indexToReplace) * 8;
    

    BUT if I try to change the "#if 0" to "#if (is_big_endian == 0)" I get a wrong result when I use "0" as the indexToReplace.

    Please try to get beyond referring to this as endian related. Once again, that is not what is happening. The code I've posted works regardless of the processor endian mode.

    Please reread the part about the correct/proper interpretation of the byte index. It is how one choses to number the bytes.

    Once again, 99.44% of the time, it is oriented from the right (LSB to MSB). Graphically, most people use:

    | MSB |     |     |     |     |     |     | LSB |
    |  01 |  23 |  45 |  67 |  89 |  AB |  CD |  EF | DATA
    |   7 |   6 |   5 |   4 |   3 |   2 |   1 |   0 | INDEX
    

    However, for your exact problem statement, it is oriented from the left (MSB to LSB):

    | MSB |     |     |     |     |     |     | LSB |
    |  01 |  23 |  45 |  67 |  89 |  AB |  CD |  EF | DATA
    |   0 |   1 |   2 |   3 |   4 |   5 |   6 |   7 | INDEX
    

    This is unusual. It is also slower because the calculation of the shift is more complex.

    It gives out: 0x12345678CDEF34FF instead of 0xFF345678CDEF3456

    Ultimately, whatever you did to the #if, it chose the incorrect equation.