I was trying to implement LZ compression....and was trying to compress some files using it....but I am having some logical problem... I seriously don't have any idea about how the data is to be stored back to file... real problem is : suppose I got a matching string "ls" [who's entry in the table is already done at let's say 289th index] now if replace 289 by ls in file then how is it to be done?? because if Earlier "ls" took 2 bytes then now 289 will take 3 bytes. If above is true then why is this method called compression and if not then what will be the correct method... I just need an answer that clarifies me about this logic particularly in detail.
Some code that I have made till now:
int main()
{
int id,flag,d;
char ch,a[2],newstr[1000],currstr[1000];
FILE *fr;
FILE *fw;
createTable();
fr=fopen("old.txt","rb");
fw=fopen("new.txt","wb");
flag=0;
fscanf(fr,"%c",&ch);
fprintf(fw,"%c",ch);
a[0]=ch;
a[1]='\0';
strcpy(currstr,a);
while(!feof(fr))
{
showTable();
fscanf(fr,"%c",&ch);
a[0]=ch;
a[1]='\0';
strcat(currstr,a);
strcpy(newstr,currstr);
id=lookTable(newstr);
if(id!=5000)
{
strcpy(currstr,newstr);
flag=1;
d=id;
}
else
{
if (flag==0)
{
fprintf(fw,"%s",a);
}
else
{
fprintf(fw,"%d",d);
printf("%d new data\n",d);
}
addEntry(newstr);
strcpy(currstr,a);
flag=0;
}
}
fprintf(fw,"%s",currstr);
fclose(fr);
fclose(fw);
return 0;
}
Typically, compressed data is not stored as a text-file, so your value 289
, should probably not be stored as the text '2', '8', '9'
, but as a number of 289
as two bytes (289/256 = 1
and 289%256 = 33
).
You will obviously have to do this for all the (sub)strings, and store the translation table inside the output file, such that you can translate it back again.