Short version: Where is the 16-byte offset coming from when exporting an MPEG-4 video stream from a DICOM file with Pydicom
via the following code? (And, bonus question, is it always a 16-byte offset?)
from pathlib import Path
import pydicom
in_dcm_filename: str = ...
out_mp4_filename: str = ...
ds = pydicom.dcmread(in_dcm_filename)
Path(out_mp4_filename).write_bytes(ds.PixelData[16:]) # 16-byte offset necessary
For reproducibility, one can use e.g. this DICOM file which I found in this old discussion on Google Groups (content warning: the video shows the open brain in a neurosurgical intervention).
I have a number of DICOM files containing surgical MPEG-4 video streams (transfer syntax UID 1.2.840.10008.1.2.4.102 – MPEG-4 AVC/H.264 High Profile / Level 4.1). I wanted to export the video streams from the DICOM files for easier handling in downstream tasks.
After a bit of googling, I found the following discussion, suggesting the use of dcmdump
from DCMTK
, as follows (which I was able to reproduce):
dcmdump +P 7fe0,0010 <in_dcm_filename> +W <out_folder>
.<out_folder>
, mpeg4.dcm.0.raw
and mpeg4.dcm.1.raw
, discard the first one, which has a size of 0 bytes, and keep the second one (potentially changing its suffix to .mp4
), which is a regular, playable video file.From what I saw in the dcmdump
command, I concluded this was just a raw dump of tag 7fe0,0010
(which is the Pixel Data attribute)¹, so I thought I could reproduce this with Pydicom
. My first attempt was using Path(out_mp4_filename).write_bytes(ds.PixelData)
(see code sample above for complete details); however, I ended up with a file that could not be played. I then compared a hex dump of the dcmdump
result and of the Pydicom
result:
$ hd ./dcmdump.mp4 | head
00000000 00 00 00 20 66 74 79 70 69 73 6f 6d 00 00 02 00 |... ftypisom....|
00000010 69 73 6f 6d 69 73 6f 32 61 76 63 31 6d 70 34 31 |isomiso2avc1mp41|
00000020 00 00 00 08 66 72 65 65 00 ce 97 1d 6d 64 61 74 |....free....mdat|
...
$ hd ./pydicom.mp4 | head
00000000 fe ff 00 e0 00 00 00 00 fe ff 00 e0 3e bc ce 00 |............>...|
00000010 00 00 00 20 66 74 79 70 69 73 6f 6d 00 00 02 00 |... ftypisom....|
00000020 69 73 6f 6d 69 73 6f 32 61 76 63 31 6d 70 34 31 |isomiso2avc1mp41|
...
From this I noticed that my Pydicom
export contained 16 preceding extra bytes. Once I removed them via Path(out_mp4_filename).write_bytes(ds.PixelData[16:])
, I got the exact same, playable video export as with dcmdump
.
So, again, my question is: Where do these 16 extra bytes come from, what is their meaning, and am I safe simply removing them?
¹) Update: In hindsight, I should have gotten suspicious because of the two files that were created by dcmdump
.
The reason why you see these bytes is that the pixel data is encapsulated. Using dcmdump shows this clearly:
(7fe0,0010) OB (PixelSequence #=2) # u/l, 1 PixelData
(fffe,e000) pi (no value available) # 0, 1 Item
(fffe,e000) pi 00\00\00\20\66\74\79\70\69\73\6f\6d\00\00\02\00\69\73\6f\6d\69\73... # 13548606, 1 Item
(fffe,e0dd) na (SequenceDelimitationItem) # 0, 0 SequenceDelimitationItem
If you check the leading bytes that you strip, you can see that they contain the respective delimiter tags as shown in the dump output. You can also see that there are 2 items contained, the first of them empty - these are the ones you get using dcmtk.
To get the encapsulated contents, you can use encaps.defragment_data
in pydicom 2.x, which returns all contained fragments in one data block (in pydicom 3, the interface will change to yield one fragment at a time):
from pydicom import dcmread, encaps
ds = dcmread"test_720.dcm")
with open("test_720.mpeg4", "wb") as f:
f.write(encaps.defragment_data(ds.PixelData))
Note that in general, the fragments are parts of multi-frame data (in the most common case, one fragment per frame), and you may want to handle them separately. In the case of MPEG4 there is only one continuous datastream with the video data, and merging any fragments this may be split into is the correct way to handle this.
Note that the first (empty) item is the Basic Offset Table, that is required to be in the first item of the encapsulated data. It can be empty, and for the MPEG transfer syntax it is always empty. From the DICOM standard:
The Basic Offset Table is not used because MPEG2 contains its own mechanism for describing navigation of frames.