cencryptioncompressionlzw

LZ compression technique


I was trying to implement LZ compression....and was trying to compress some files using it....but I am having some logical problem... I seriously don't have any idea about how the data is to be stored back to file... real problem is : suppose I got a matching string "ls" [who's entry in the table is already done at let's say 289th index] now if replace 289 by ls in file then how is it to be done?? because if Earlier "ls" took 2 bytes then now 289 will take 3 bytes. If above is true then why is this method called compression and if not then what will be the correct method... I just need an answer that clarifies me about this logic particularly in detail.

Some code that I have made till now:

int main()
{
    int id,flag,d;
    char ch,a[2],newstr[1000],currstr[1000];
    FILE *fr;
    FILE *fw;
    createTable();
    fr=fopen("old.txt","rb");
    fw=fopen("new.txt","wb");
    flag=0;
    fscanf(fr,"%c",&ch);
    fprintf(fw,"%c",ch);
    a[0]=ch;
    a[1]='\0';
    strcpy(currstr,a);
    while(!feof(fr))
    {
        showTable();
        fscanf(fr,"%c",&ch);
        a[0]=ch;
        a[1]='\0';
        strcat(currstr,a);
        strcpy(newstr,currstr);
        id=lookTable(newstr);
        if(id!=5000)
        {
            strcpy(currstr,newstr);
            flag=1;
            d=id;
        }
        else
        {
            if (flag==0)
            {
                fprintf(fw,"%s",a);
            }
            else
            {
                fprintf(fw,"%d",d);
                printf("%d new data\n",d);
            }
            addEntry(newstr);
            strcpy(currstr,a);
            flag=0;
        }
    }
    fprintf(fw,"%s",currstr);
    fclose(fr);
    fclose(fw);
    return 0;
}

Solution

  • Typically, compressed data is not stored as a text-file, so your value 289, should probably not be stored as the text '2', '8', '9', but as a number of 289 as two bytes (289/256 = 1 and 289%256 = 33).

    You will obviously have to do this for all the (sub)strings, and store the translation table inside the output file, such that you can translate it back again.