16-byte offset in MPEG-4 video export from DICOM file

Short version: Where is the 16-byte offset coming from when exporting an MPEG-4 video stream from a DICOM file with Pydicom via the following code? (And, bonus question, is it always a 16-byte offset?)

from pathlib import Path
import pydicom

in_dcm_filename: str = ...
out_mp4_filename: str = ...

ds = pydicom.dcmread(in_dcm_filename)
Path(out_mp4_filename).write_bytes(ds.PixelData[16:])  # 16-byte offset necessary

For reproducibility, one can use e.g. this DICOM file which I found in this old discussion on Google Groups (content warning: the video shows the open brain in a neurosurgical intervention).

Long version

I have a number of DICOM files containing surgical MPEG-4 video streams (transfer syntax UID 1.2.840.10008.1.2.4.102 – MPEG-4 AVC/H.264 High Profile / Level 4.1). I wanted to export the video streams from the DICOM files for easier handling in downstream tasks.

After a bit of googling, I found the following discussion, suggesting the use of dcmdump from DCMTK, as follows (which I was able to reproduce):

Run dcmdump +P 7fe0,0010 <in_dcm_filename> +W <out_folder>.
From the resulting two files in <out_folder>, mpeg4.dcm.0.raw and mpeg4.dcm.1.raw, discard the first one, which has a size of 0 bytes, and keep the second one (potentially changing its suffix to .mp4), which is a regular, playable video file.

From what I saw in the dcmdump command, I concluded this was just a raw dump of tag 7fe0,0010 (which is the Pixel Data attribute)¹, so I thought I could reproduce this with Pydicom. My first attempt was using Path(out_mp4_filename).write_bytes(ds.PixelData) (see code sample above for complete details); however, I ended up with a file that could not be played. I then compared a hex dump of the dcmdump result and of the Pydicom result:

$ hd ./dcmdump.mp4 | head
00000000  00 00 00 20 66 74 79 70  69 73 6f 6d 00 00 02 00  |... ftypisom....|
00000010  69 73 6f 6d 69 73 6f 32  61 76 63 31 6d 70 34 31  |isomiso2avc1mp41|
00000020  00 00 00 08 66 72 65 65  00 ce 97 1d 6d 64 61 74  |....free....mdat|
...
$ hd ./pydicom.mp4 | head
00000000  fe ff 00 e0 00 00 00 00  fe ff 00 e0 3e bc ce 00  |............>...|
00000010  00 00 00 20 66 74 79 70  69 73 6f 6d 00 00 02 00  |... ftypisom....|
00000020  69 73 6f 6d 69 73 6f 32  61 76 63 31 6d 70 34 31  |isomiso2avc1mp41|
...

From this I noticed that my Pydicom export contained 16 preceding extra bytes. Once I removed them via Path(out_mp4_filename).write_bytes(ds.PixelData[16:]), I got the exact same, playable video export as with dcmdump.

So, again, my question is: Where do these 16 extra bytes come from, what is their meaning, and am I safe simply removing them?

_{¹) Update: In hindsight, I should have gotten suspicious because of the two files that were created by dcmdump.}

Solution

The reason why you see these bytes is that the pixel data is encapsulated. Using dcmdump shows this clearly:

(7fe0,0010) OB (PixelSequence #=2)                      # u/l, 1 PixelData
  (fffe,e000) pi (no value available)                     #   0, 1 Item
  (fffe,e000) pi 00\00\00\20\66\74\79\70\69\73\6f\6d\00\00\02\00\69\73\6f\6d\69\73... # 13548606, 1 Item
(fffe,e0dd) na (SequenceDelimitationItem)               #   0, 0 SequenceDelimitationItem

If you check the leading bytes that you strip, you can see that they contain the respective delimiter tags as shown in the dump output. You can also see that there are 2 items contained, the first of them empty - these are the ones you get using dcmtk.

pydicom 2

To get the encapsulated contents, you can use encaps.defragment_data in pydicom 2.x, which returns all contained fragments in one data block:

    from pydicom import dcmread, encaps

    ds = dcmread("test_720.dcm")
    with open("test_720.mpeg4", "wb") as f:
        f.write(encaps.defragment_data(ds.PixelData))

Note that in general, the fragments are parts of multi-frame data (in the most common case, one fragment per frame), and you may want to handle them separately. In the case of MPEG4 there is only one continuous datastream with the video data, and merging any fragments this may be split into is the correct way to handle this.

Note that the first (empty) item is the Basic Offset Table, that is required to be in the first item of the encapsulated data. It can be empty, and for the MPEG transfer syntax it is always empty. From the DICOM standard:

The Basic Offset Table is not used because MPEG2 contains its own mechanism for describing navigation of frames.

pydicom 3

In pydicom 3, encaps.defragment_data is deprecated in favor of encaps.generate_fragments, which will yield one fragment at a time. As @scaramallion pointed out in the comments, there are also more convenient new generator functions that yield only the fragments/frames with the actual data, excluding the offset table: generate_fragmented_frames and generate_frames. In this case you don't have to worry about the internal structure (e.g. the offset table):

    from pydicom import dcmread, encaps

    ds = dcmread("test_720.dcm")
    with open("test_720.mpeg4", "wb") as f:
        for frame in encaps.generate_frames(ds.PixelData):
            # for other use cases, you may save the frames separately
            f.write(frame)