How to read value length (but not value) in pydicom?

It seems I cannot find a way to read only the value length of some long tag in pydicom. I have tried the following, and while dcmread is very fast, dataset[tag] takes half a second to load 1.5 GB of data. However, I am not interested in reading these 1.5 GB of data - the only information I am looking for is where these 1.5 GB of data are located in the file (start and end offsets). How can I get this information?

# python bug.py && tuna bug.prof
import cProfile
import pydicom

def bug():
    path = "C:\\Data\\bug.ima"
    tag = pydicom.tag.Tag(0x7FE1, 0x1010)
    dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
    element = dataset[tag]
    _a = element.file_tell
    _b = len(element.value)

cProfile.run("bug()", filename="bug.prof")

One workaround using a private API is this:

# python bug.py && tuna bug.prof
import cProfile
import pydicom

def bug():
    path = "C:\\Data\\bug.ima"
    tag = pydicom.tag.Tag(0x7FE1, 0x1010)
    dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
    element = dataset._dict[tag]
    _a = element.value_tell
    _b = element.length

cProfile.run("bug()", filename="bug.prof")

Another workaround using a another (at least not marked as private) API is this:

# python bug.py && tuna bug.prof
import cProfile
import pydicom

def bug():
    path = "C:\\Data\\bug.ima"
    tag = pydicom.tag.Tag(0x7FE1, 0x1010)
    dataset = pydicom.dcmread(path, specific_tags=[tag], defer_size=0)
    dataset = dataset.__array__().item()
    element = dataset[tag]
    _a = element.value_tell
    _b = element.length

cProfile.run("bug()", filename="bug.prof")

Is that possible using just the public API?

Solution

EDIT: I see now that the help notes deferred-read elements will be converted... so unfortunately get_item will not work as desired in this case. Probably using _dict is your best bet - I don't see that 'hidden' member ever changing, so it should be safe for the long term.

The API method you are looking for is Dataset.get_item. That will return the RawDataElement (not yet decoded DataElement), that you can use .value_tell and .length on, assuming the decoding has not already been triggered by some other access.

Another option which might offer a little speed improvement is to model the dicomfile context manager in pydicom.util.leanread, but add passing defer_size through to the data_element_generator, and of course filter the generated elements by tag. That module demonstrates a simpler read without handling so many special cases, and returns simple tuples for the data elements. If you try that way, however, be aware that the code has not been updated in quite some time so will not be as robust as mainline pydicom.