c++memorybinarymkvmatroska

How do i extract timstemp of a cluster from mkv reading the file binary


When I read a binary mkv the id of a cluster is E7 byte and the timestamp has an unsigned int value but when I read it id doesn't give me the correct timestamp.

double mkVSParser::get_clusters_timestamps(char *&package,unsigned long &size)
{
      uint8_t *data_to_find = new uint8_t;
      *data_to_find=0xE7;//the id
      char * buffer = new char[sizeof (uint8_t)];
      uint8_t current_data[sizeof (uint8_t)];

      for(int i=0;i<size;i++)//finde the first 0xE7 in an cluster
      {
          memcpy(&buffer[0],&package[i],sizeof (uint8_t));

          memcpy(&current_data[0],buffer,sizeof (uint8_t));

          if (memcmp(data_to_find, current_data, sizeof (uint8_t)) == 0)
          {
              unsigned int timestemp;
              std::cout<<"position of byte =="<<i<<"and id =="<<(unsigned int)package[i]<<std::endl;

              memcpy(&timestemp,&package[i+1],sizeof(unsigned int));

              std::cout<<"cluster timestemp ="<<timestemp<<std::endl;
              return 0;
          }

            }

      return 0;
}

Is there something that I missed?


Solution

  • MKV binary data is in EBML format and unsigned integer may be variable in size. Variable size int's may consist of variable number of octets (may have different size in bytes).

    Each Variable Size Integer starts with a VINT_WIDTH followed by a VINT_MARKER. VINT_WIDTH is a sequence of zero or more bits of value 0, and is terminated by the VINT_MARKER, which is a single bit of value 1. The total length in bits of both VINT_WIDTH and VINT_MARKER is the total length in octets in of the Variable Size Integer.

    The single bit 1 starts a Variable Size Integer with a length of one octet. The sequence of bits 01 starts a Variable Size Integer with a length of two octets. 001 starts a Variable Size Integer with a length of three octets, and so on, with each additional 0-bit adding one octet to the length of the Variable Size Integer.

    Position of first '1' bit in first byte of variable size integer denotes size in bytes. If it's on the first position

    1XXXXXXX (I use 'X' for other bits of the number here, besides the length part)

    then the variable is one byte long and the rest of the bits after first '1' bit (7 lower bits in this case) are the binary representation of the number. Variable size int that starts with

    0000001X XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

    is seven bytes long as first '1' bit here is on the seventh position.

    So first you need to read first byte of the number and find the position N of the first '1' bit and then read the whole number N bytes long ignoring that first '1' bit (like it's a zero bit).

    constexpr uint8_t VarSizeIntLenMark(int length)
    {
        return 1 << (8 - length); // set single bit at length's position
    }
    
    int VarSizeIntLen(const uint8_t* data)
    {
        for (int i = 1; i <= 8; i++)
            if (VarSizeIntLenMark(i) & data[0]) return i;
        return 0;
    }
    
    uint64_t ReadVariableSizeInt(const uint8_t* data)
    {
        int length = VarSizeIntLen(data[0]);
        uint64_t parsedValue = data[0] & (~VarSizeIntLenMark(length)); // invert VINT_MARKER bit
        for (int i = 1; i < length; i++) // read other bytes
            parsedValue = (parsedValue << 8) + data[i];
        return parsedValue;
    }