It seems I cannot find a way to read only the value length of some long tag in pydicom
. I have tried the following, and while dcmread
is very fast, dataset[tag]
takes half a second to load 1.5 GB of data. However, I am not interested in reading these 1.5 GB of data - the only information I am looking for is where these 1.5 GB of data are located in the file (start and end offsets). How can I get this information?
# python bug.py && tuna bug.prof
import cProfile
import pydicom
def bug():
path = "C:\\Data\\bug.ima"
tag = pydicom.tag.Tag(0x7FE1, 0x1010)
dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
element = dataset[tag]
_a = element.file_tell
_b = len(element.value)
cProfile.run("bug()", filename="bug.prof")
One workaround using a private API is this:
# python bug.py && tuna bug.prof
import cProfile
import pydicom
def bug():
path = "C:\\Data\\bug.ima"
tag = pydicom.tag.Tag(0x7FE1, 0x1010)
dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
element = dataset._dict[tag]
_a = element.value_tell
_b = element.length
cProfile.run("bug()", filename="bug.prof")
Another workaround using a another (at least not marked as private) API is this:
# python bug.py && tuna bug.prof
import cProfile
import pydicom
def bug():
path = "C:\\Data\\bug.ima"
tag = pydicom.tag.Tag(0x7FE1, 0x1010)
dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
dataset = dataset.__array__().item()
element = dataset[tag]
_a = element.value_tell
_b = element.length
cProfile.run("bug()", filename="bug.prof")
Is that possible using just the public API?
EDIT: I see now that the help notes deferred-read elements will be converted... so unfortunately get_item
will not work as desired in this case. Probably using _dict
is your best bet - I don't see that 'hidden' member ever changing, so it should be safe for the long term.
The API method you are looking for is Dataset.get_item
. That will return the RawDataElement
(not yet decoded DataElement
), that you can use .value_tell
and .length
on, assuming the decoding has not already been triggered by some other access.
Another option which might offer a little speed improvement is to model the dicomfile
context manager in pydicom.util.leanread
, but add passing defer_size
through to the data_element_generator
, and of course filter the generated elements by tag. That module demonstrates a simpler read without handling so many special cases, and returns simple tuples for the data elements. If you try that way, however, be aware that the code has not been updated in quite some time so will not be as robust as mainline pydicom.