windowsntfsjournalusn

Estimate the number of USN records on NTFS volume


When the USN journal is used for the first time, the volume's entire set of USN records must be enumerated using the FSCTL_ENUM_USN_DATA control code. This is usually a lengthy operation.

Is there a way to estimate the number of records on the volume prior to running it, so progress can be displayed?

I'm guessing the USN data for the entire volume is generated from the MFT, with one record per file (approximately). So perhaps a way to estimate the number of active files in the MFT would work.


Solution

  • You can use FSCTL_GET_NTFS_VOLUME_DATA to get the length in bytes of the MFT. If you compare this to the number of records on a selection of representative volumes, you could estimate the average length of a single MFT record and use this to calculate an estimate for the number of records on a particular volume.

    Because the MFT contains (for example) the security information for every file, the average length will vary significantly from volume to volume, so I think you'll only get order-of-magnitude accuracy, but it may be good enough in most cases.

    Another approach would be to assume that the file reference numbers increase linearly, which is roughly true. You can use FSCTL_ENUM_USN_DATA to find out whether there are any files with a reference number above a particular guess or not; you'd need no more than 128 guesses to determine the actual maximum reference number. That would at least give you a percentage complete between 0 and 100 at any given point, it wouldn't be entirely uniform but then progress bars never are. :-)

    Additional:

    Looking more closely, on Windows 7 x64 the "next id" field returned by FSCTL_ENUM_USN_DATA (the quadword returned before the first USN_RECORD structure) isn't a file reference number after all, but the file record segment number. So, as you observed, the last id number returned, multiplied by BytesPerFileRecordSegment (1024), is equal to MftValidDataLength.

    File reference numbers appear to be made up of two parts. The low six bytes contain the file record segment number. The first record returned from each request always has a FRN whose segment number is the same as the "next id" fed into StartFileReferenceNumber, except for the first call when StartFileReferenceNumber is zero. The upper two bytes contain unspecified additional information, which is never zero.

    It seems that FSCTL_ENUM_USN_DATA accepts either a file record segment number (in which case the top two bytes are zero) or a file reference number (in which case the top two bytes are nonzero).

    One oddity is that I can't find two records with the same record segment number. This suggests that each file record is using at least 1K in the MFT, which doesn't seem reasonable.

    Anyway, the upshot is that it is probably sensible to multiply the "next id" by BytesPerFileRecordSegment and divide it by MftValidDataLength to get a percentage completed, so long as you cope gracefully if this returns a nonsensical result.