indexingfile-formatvisual-foxprofoxproxbase

Visual FoxPro Compact Index File Format


I am trying to understand the file format of a Visual FoxPro compact index (*.IDX). I am currently referring to Microsoft's documentation for guidance.

The index is a B-tree of 512-byte nodes. Each leaf ("exterior") node contains multiple entries. Each entry consists of four pieces of data:

The entries (without their keys) are stored at the beginning of the node, immediately after the node's 24-byte header. Their keys are not included at this location because the keys vary in length, while the row number, duplicate byte count and trailing byte counts are fixed in length. The keys are stored at the end of the node and work their way backward. For example:

How do I determine the individual lengths of the keys? The documentation does not appear to specify this. They are perfectly contiguous (no null-byte separators).

I can isolate the keys manually by visual inspection. I suspected that the trailing byte count represented the length of the key. However, it did not correlate to the lengths determined by this inspection.

I believe that the FoxPro file formats are derived from the xBase standard. Perhaps this rings a bell?


Solution

  • After discovering the XBase::Index Perl module, I have determined that the keys in the exterior node are effectively the same length as the fixed-length keys found in the interior nodes, except any trailing spaces are removed. That is what the "trailing byte count" mentioned in the documentation refers to (how many trailing spaces were truncated off the end of the key). I still have not determined what the "duplicate byte count" is, but the module at least clarified its relationship:

    variable_key_length = fixed_key_length - duplicate_byte_count - trailing_byte_count
    

    For example, suppose the fixed key length for this index was 10 bytes. Now suppose that the key "DOG " was stored in an external node. Its duplicate byte count (according to what I have observed) will most likely be zero, while its trailing byte count will be 7 (the number of spaces truncated). Therefore, only the three bytes representing "DOG" would be stored.