vcdiff

Explanation about RFC3284 - VCDIFF format


i'm new trying to read this type of documentation, and i'm confused in how the instruction of VCDIFF works, this is the original doc:

https://www.rfc-editor.org/rfc/rfc3284

This part:

  ADD:  This instruction has two arguments, a size x and a sequence
        of x bytes to be copied.
  COPY: This instruction has two arguments, a size x and an address
        p in the string U.  The arguments specify the substring of U
        that must be copied.  We shall assert that such a substring
        must be entirely contained in either S or T.
  RUN:  This instruction has two arguments, a size x and a byte b,
        that will be repeated x times.

Now the doc put an example:

     a b c d e f g h i j k l m n o p
     a b c d w x y z e f g h e f g h e f g h e f g h z z z z

     COPY  4, 0
     ADD   4, w x y z
     COPY  4, 4
     COPY 12, 24
     RUN   4, z

I don't understand what every op did, i think the first copy is the first "a b c d", the add now includes "w x y z", now i don't understand well how the next two copies works.

If i think would be useful if someone can show what do that instructions, like "this instruction have this string as result and the next this", just to can compare step by step :D

Thx.


Solution

  • It looks like at the point you are executing this you will know the length of the output. In this "language" the input and output are consecutive in "memory". So you start with:

    abcdefghijklmnop----------------------------
    |<-     S    ->||<-            T         ->|
    

    First COPY 4 bytes starting at offset 0 in the combined string:

    ABCDefghijklmnopABCD------------------------
    |<-     S    ->||<-            T         ->|
    

    Then ADD 4 bytes, literally w x y z:

    abcdefghijklmnopabcdWXYZ--------------------
    |<-     S    ->||<-            T         ->|
    

    Then COPY 4 bytes starting at offset 4:

    abcdEFGHijklmnopabcdwxyzEFGH----------------
    |<-     S    ->||<-            T         ->|
    

    Then COPY 12 bytes starting at offset 24. This is a little tricky, because offset 24 is the "efgh" we just wrote and we haven't written the last 8 bytes yet, but if you do it one byte at a time the overlap doesn't matter:

                            |<- from ->|
                                |<-  to  ->|
    abcdEFGHijklmnopabcdwxyzefghEFGHEFGHEFGH----
    |<-     S    ->||<-            T         ->|
    

    Finally there is a RUN of 4 consecutive bytes all "z":

    abcdEFGHijklmnopabcdwxyzefghefghefghefghZZZZ
    |<-     S    ->||<-            T         ->|