c++algorithmistream-iterator

Why std::istream_iterator<> with multiple copy_n() always writes firs value


I tried to copy the input line into multiple vectors:

#include <vector>
#include <sstream>
#include <istream>
#include <iterator>
#include <algorithm>
#include <iostream>

int main(){
  std::vector<int> v1, v2, v3;
  std::istringstream is ("1 2 3 4 5 6");
  std::istream_iterator<int> iit (is);
  std::copy_n(iit, 2, std::back_inserter(v1));
  std::copy_n(iit, 2, std::back_inserter(v2));
  std::copy(iit, std::istream_iterator<int>(), std::back_inserter(v3));
  std::ostream_iterator<int> oit(std::cout, ", ");
  std::copy(v1.begin(),v1.end(), oit);
  std::cout << "\n";
  std::copy(v2.begin(),v2.end(), oit);
  std::cout << "\n";
  std::copy(v3.begin(),v3.end(), oit);
  std::cout << "\n";
  return 0;

}

I assume this porgram output:

1, 2, 
3, 4, 
5, 6,

But I get this:

1, 2, 
1, 3, 
1, 4, 5, 6, 

Why copy_n always insert 1 at the beginning of vectors?


Solution

  • This comes down to a perhaps unintuitive fact of istream_iterator: it doesn't read when you dereference it, but instead when you advance (or construct) it.


    (x indicates a read)
    
    Normal forward iterators:
    
       Data:            1   2   3   (EOF)
       
       Construction
       *it              x
       ++it
       *it                  x
       ++it
       *it                      x
       ++it                         (`it` is now the one-past-the-end iterator)
       Destruction
    
    Stream iterators:
    
       Data:            1   2   3   (EOF)
       
       Construction     x
       *it
       ++it                 x
       *it
       ++it                     x
       *it
       ++it                         (`it` is now the one-past-the-end iterator)
       Destruction
    

    We still expect the data to be provided to us via *it. So, to make this work, each bit of read data has to be temporarily stored in the iterator itself until we next do *it.

    So, when you create iit, it's already pulling the first number out for you, 1. That data is stored in the iterator. The next available data in the stream is 2, which you then pull out using copy_n. In total that's two pieces of information delivered, out of a total of two that you asked for, so the first copy_n is done.

    The next time, you're using a copy of iit in the state it was in before the first copy_n. So, although the stream is ready to give you 3, you still have a copy of that 1 "stuck" in your copied stream iterator.


    Why do stream iterators work this way? Because you cannot detect EOF on a stream until you've tried and failed to obtain more data. If it didn't work this way, you'd have to do a dereference first to trigger this detection, and then what should the result be if we've reached EOF?

    Furthermore, we expect that any dereference operation produces an immediate result; with a container that's a given, but with streams you could otherwise be blocking waiting for data to become available. It makes more logical sense to do this blocking on the construction/increment, instead, so that your iterator is always either valid, or it isn't.


    If you sack off the copies, and construct a fresh stream iterator for each copy_n, you should be fine. Though I would generally recommend only using one stream iterator per stream, as that'll avoid anyone having to worry about this.