ccs50edx

CS50 pset4 recover - Recovered image does not match


I have been trying to do this problem for at least a week now, and can't seem to understand where is the problem, I already checked everything in google, and dont know any programmer in real life to ask them personaly, so if anyone can help me it would be great.

None of the images generated load, and it doesnt recover 50 as it is suposed to, it recovers 986.

I get this results in check50:

:) recover.c exists.

:) recover.c compiles.

:) handles lack of forensic image

:( recovers 000.jpg correctly

recovered image does not match

:( recovers middle images correctly

recovered image does not match

:( recovers 049.jpg correctly

recovered image does not match

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <cs50.h>

typedef uint8_t BYTE;

#define BLOCK_SIZE 512

int main(int argc, char *argv[])
{
    //it only accepts one comand argument in the name of an image
    if (argc != 2)
    {
        printf("Usage: ./recover IMAGE");
        return 1;
    }

    //check if it can open the image
    FILE *file = fopen(argv[1], "r");

    if (file == NULL)
    {
        printf("The image cannot be opened");
        return 1;
    }

    bool jpg_before = false;
    int counter = 0;
    FILE *image = NULL;
    char name[8];
    unsigned char buffer[BLOCK_SIZE];

    //while there is still jpegs in the file
    while (fread(buffer, BLOCK_SIZE, 1, file) == 1)
    {
        if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xe0) == 0xe0)
        {
            jpg_before = true;
        }

        if(jpg_before == true)
        {
            sprintf(name, "%03i.jpg", counter);
            counter++;
            image = fopen(name, "a");
            fwrite(buffer, BLOCK_SIZE, 1, image);
            fclose(image);
        }

    }
    fclose(file);
}

(also please keep in mind I'm new to programming, 16 years old and english is not my first lenguage)


Solution

    1. When you detect a header in the input, you set jpg_before. But, you never clear it.
    2. Once the flag is set, each block will be put into a different file.
    3. Every output file should consist of a header, followed by the associated data blocks.
    4. name[8] is a bit too small. The compiler will complain because the int could [in theory] be 10 or so digits, so the sprintf could overflow. Don't be stingy--use (e.g.): char name[20];
    5. Output file should be opened with "w" instead of "a". If the program is run twice, the second time, the output file(s) will be incorrect.

    Here is the refactored code:

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    //#include <cs50.h>
    
    typedef uint8_t BYTE;
    
    #define BLOCK_SIZE 512
    
    int
    main(int argc, char *argv[])
    {
        // it only accepts one comand argument in the name of an image
        if (argc != 2) {
            printf("Usage: ./recover IMAGE");
            return 1;
        }
    
        // check if it can open the image
        FILE *file = fopen(argv[1], "r");
    
        if (file == NULL) {
            printf("The image cannot be opened");
            return 1;
        }
    
        int counter = 0;
        FILE *image = NULL;
        char name[20];
        unsigned char buffer[BLOCK_SIZE];
    
        // while there is still jpegs in the file
        while (fread(buffer, BLOCK_SIZE, 1, file) == 1) {
            if (buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xe0) == 0xe0) {
                if (image != NULL)
                    fclose(image);
    
                sprintf(name, "%03i.jpg", counter);
                counter++;
    
                image = fopen(name, "w");
            }
    
            fwrite(buffer, BLOCK_SIZE, 1, image);
        }
    
        if (image != NULL)
            fclose(image);
    
        fclose(file);
    }
    

    UPDATE:

    From the comments below:

    Points 2 and 3 look to be handled by the file being opened for appending. Leaving the file open is probably a better idea, though. Faster and handles point 6. – user4581301

    If H is header and D is data, for an input of (e.g): H1,D1,D2,D3,D4,H2,D5,D6,D7:

    Instead of two output files: F0:H1,D1,D2,D3,D4 and F1:H2,D5,D6,D7

    We'd have: F0:H1, F1:D1, F2:D2, F3:D3, F4:D4, F5:H2, F6:D5, F7:D6, F8:D7

    Although my refactored code was correct, the top section of my answer had an incorrect analysis of what OP's code was actually doing.

    I've fixed that. But, to make user4581301's make sense, here is the original analysis:

    1. When you detect a header in the input, you set jpg_before. But, you never clear it.
    2. You only write to the output stream for the header block, so any data is not copied. So, each output file will only be 512 bytes
    3. You immediately close the output stream after writing the header. It should be left open.
    4. Every block must go to a given output file, not just the header.
    5. name[8] is a bit too small. The compiler will complain because the int could [in theory] be 10 or so digits, so the sprintf could overflow. Don't be stingy--use (e.g.): char name[20];
    6. Output file should be opened with "w" instead of "a". If the program is run twice, the second time, the output file(s) will be incorrect.

    UPDATE #2:

    First of all thanks! But it is giving me a segmentation fault, do you have any idea why? because everything seems correct – Isa M

    From code inspection, the only place that could segfault is the fwrite call (i.e. image is NULL).

    I confirmed this by running the program under gdb [I have the cs50 recover input file]. When the program faults, just do tb to get a stack traceback.

    image could be NULL for the following reasons:

    1. The fopen for output file could fail (due to permissions, space, etc.) and return NULL. There was no check after the call as there was for opening the input file.

    2. image starts out being NULL. If there is some sort of extra file data/file header before the first jpg header (e.g. before FF/D8/FF/E0) the if will not match on the first block read. The fwrite will be called even with a NULL in image.

    Option (2) is what actually occurred because cs50's file has an extra header at the top of the file. You can see this by examining the file with a hex editor/dumper (e.g.) od or xxd:

    00000000: 00000000 00000000 00000000 00000000  ................
    *
    00000200: 63733530 2E6C792F 73757270 72697365  cs50.ly/surprise
    00000210: 00000000 00000000 00000000 00000000  ................
    *
    00000400: FFD8FFE0 00104A46 49460001 01000001  ......JFIF......
    

    The code will not see a valid header (i.e. the if matches) until offset 400. So, there are two extraneous fread calls at the start until things sync up.

    The fix is to change:

    fwrite(buffer, BLOCK_SIZE, 1, image);
    

    Into:

    if (image != NULL)
        fwrite(buffer, BLOCK_SIZE, 1, image);
    

    I've written a few answers on this problem before. However, I forgot to include this. I just wrote the code but did not test it ;-)

    To round things out, I've added more return code checking and added "rb" and "wb" to the fopen calls, just in case you're running on Windoze.

    Here is the updated/fixed code (I've tested it this time ;-):

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    #include <string.h>
    #include <errno.h>
    //#include <cs50.h>
    
    typedef uint8_t BYTE;
    
    #define BLOCK_SIZE 512
    
    void
    onerr(const char *action,const char *file)
    {
    
        printf("%s -- %s -- %s\n",action,file,strerror(errno));
        exit(1);
    }
    
    int
    main(int argc, char *argv[])
    {
        // it only accepts one comand argument in the name of an image
        if (argc != 2) {
            printf("Usage: ./recover IMAGE");
            return 1;
        }
    
        // check if it can open the image
        FILE *file = fopen(argv[1], "rb");
        if (file == NULL)
            onerr("The image cannot be opened",argv[1]);
    
        int counter = 0;
        FILE *image = NULL;
        char name[20];
        unsigned char buffer[BLOCK_SIZE];
    
        // while there is still jpegs in the file
        while (fread(buffer, BLOCK_SIZE, 1, file) == 1) {
            if (buffer[0] == 0xff &&
                buffer[1] == 0xd8 &&
                buffer[2] == 0xff &&
                (buffer[3] & 0xe0) == 0xe0) {
                if (image != NULL)
                    fclose(image);
    
                sprintf(name, "%03i.jpg", counter);
                counter++;
    
                image = fopen(name, "wb");
                if (image == NULL)
                    onerr("unable to open output file",name);
            }
    
    #if 0
            fwrite(buffer, BLOCK_SIZE, 1, image);
    #else
            if (image != NULL)
                fwrite(buffer, BLOCK_SIZE, 1, image);
    #endif
        }
    
        if (image != NULL)
            fclose(image);
    
        fclose(file);
    
        return 0;
    }