pythoniteratorpysam

Python consume an iterator pair-wise


I am trying to understand Python's iterators in the context of the pysam module. By using the fetch method on a so called AlignmentFile class one get a proper iterator iter consisting of records from the file file. I can the use various methods to access each record (iterable), for instance the name with query_name:

import pysam
iter = pysam.AlignmentFile(file, "rb", check_sq=False).fetch(until_eof=True)
for record in iter:
  print(record.query_name)

It happens that records come in pairs so that one would like something like:

while True:
  r1 = iter.__next__() 
  r2 = iter.__next__()
  print(r1.query_name)     
  print(r2.query_name)

Calling next() is probably not the right way for million of records, but how can one use a for loop to consume the same iterator in pairs of iterables. I looked at the grouper recipe from itertools and the SOs Iterate an iterator by chunks (of n) in Python? [duplicate] (even a duplicate!) and What is the most “pythonic” way to iterate over a list in chunks? but cannot get it to work.


Solution

  • First of all, don't use the variable name iter, because that's already the name of a builtin function.

    To answer your question, simply use itertools.izip (Python 2) or zip (Python 3) on the iterator.

    Your code may look as simple as

    for next_1, next_2 in zip(iterator, iterator):
        # stuff
    

    edit: whoops, my original answer was the correct one all along, don't mind the itertools recipe.

    edit 2: Consider itertools.izip_longest if you deal with iterators that could yield an uneven amount of objects:

    >>> from itertools import izip_longest
    >>> iterator = (x for x in (1,2,3))
    >>> 
    >>> for next_1, next_2 in izip_longest(iterator, iterator):
    ...     next_1, next_2
    ... 
    (1, 2)
    (3, None)