Normally I process files in Python using a with statement, as in this chunk for downloading a resource via HTTP:
with (open(filename), "wb"):
for chunk in request.iter_content(chunk_size=1024):
if chunk:
file.write(chunk)
file.flush()
But this assumes I know the filename. Suppose I want to use tempfile.mkstemp()
. This function returns a handle to an open file and a pathname, so using open
in a with
statement would be wrong.
I've searched around a bit and found lots of warnings about being careful to use mkstemp
properly. Several blog articles nearly shout when they say do NOT throw away the integer returned by mkstemp
. There are discussions about the os-level filehandle being different from a Python-level file object. That's fine, but I haven't been able to find the simplest coding pattern that would ensure that
mkstemp
is called to get a file to be written towith(open...
pattern.So my question is, is there a nice way in Python to create and write to a mkstemp
generated file, perhaps using a different kind of with statemement, or do I have to manually do things like fdopen
or close
, etc. It seems there should be a clear pattern for this.
The simplest coding pattern for this is try:
/finally:
:
fd, pathname = tempfile.mkstemp()
try:
dostuff(fd)
finally:
os.close(fd)
However, if you're doing this more than once, it's trivial to wrap it up in a context manager:
@contextlib.contextmanager
def mkstemping(*args):
fd, pathname = tempfile.mkstemp(*args)
try:
yield fd
finally:
os.close(fd)
And then you can just do:
with mkstemping() as fd:
dostuff(fd)
If you really want to, of course, you can always wrap the fd up in a file object (by passing it to open
, or os.fdopen
in older versions). But… why go to the extra trouble? If you want an fd, use it as an fd.
And if you don't want an fd, unless you have a good reason that you need mkstemp
instead of the simpler and higher-level NamedTemporaryFile
,
you shouldn't be using the low-level API. Just do this:
with tempfile.NamedTemporaryFile(delete=False) as f:
dostuff(f)
Besides being simpler to with
, this also has the advantage that it's already a Python file object instead of just an OS file descriptor (and, in Python 3.x, it can be a Unicode text file).
An even simpler solution is to avoid the tempfile completely.
Almost all XML parsers have a way to parse a string instead of a file. With cElementTree
, it's just a matter of calling fromstring
instead of parse
. So, instead of this:
req = requests.get(url)
with tempfile.NamedTemporaryFile() as f:
f.write(req.content)
f.seek(0)
tree = ET.parse(f)
… just do this:
req = requests.get(url)
tree = ET.fromstring(req.content)
Of course the first version only needs to hold the XML document and the parsed tree in memory one after the other, while the second needs to hold them both at once, so this may increase your peak memory usage by about 30%. But this is rarely a problem.
If it is a problem, many XML libraries have a way to feed in data as it arrives, and many downloading libraries have a way to stream data bit by bit—and, as you might imagine, this is again true for cElementTree's XMLParser
and for requests
in a few different ways. For example:
req = requests.get(url, stream=True)
parser = ET.XMLParser()
for chunk in iter(lambda: req.raw.read(8192), ''):
parser.feed(chunk)
tree = parser.close()
Not quite as simple as just using fromstring
… but it's still simpler than using a temporary file, and probably more efficient to boot.
If that use of the two-argument form of iter
confuses you (a lot of people seem to have trouble grasping it at first), you can rewrite it as:
req = requests.get(url, stream=True)
parser = ET.XMLParser()
while True:
chunk = req.raw.read(8192)
if not chunk:
break
parser.feed(chunk)
tree = parser.close()