schemelazy-evaluationimplementationlazy-sequenceschez-scheme

Why does my lazy filtered list in scheme consume so much memory?


I'm currently learning to use some slightly more advanced features of scheme, and I've hit a road-block with lazy lists.

Basically, I'm trying to create an infinite, lazily generated list, and apply a lazy filter on it, and only take a single element. My hope was that this would consume very little memory: the filter looks at only one element at a time, and there's no need to store the previous entries. Here is my attempt at this:

(define lazy-inf-seq
  (lambda (start next)
    (delay (cons start (lazy-inf-seq (next start) next)))))

(define lazy-arithmetic-sequence
  (lambda (start d)
    (lazy-inf-seq start (lambda (v) (+ v d)))))

(define lazy-filter
  (lambda (pred seq)
    (delay
      (let loop ([sequence seq])
        (let ([forced (force sequence)])
          (cond [(null? forced) '()]
                [(pred (car forced))
                 (cons (car forced) (lazy-filter pred (cdr forced)))]
                [else (loop (cdr forced))]))))))

So, to be clear, a "lazy list" here is a procedure that, when (force)d, produces (head . tail), where head is one of the values on the list, and tail is the rest of the list (that needs to be forced in turn). I don't know if this is a "standard" lazy list in scheme or whatever, but it was the variant that made the most sense to me.

The (lazy-arithmetic-sequence a b) function produces (lazily) the infinite list a, a+b, a+2b, a+3b, ...

The lazy-filter function is the heart of the matter: it takes a predicate and a lazy list, and returns a lazy list with all the filtered elements. When forced, it goes through the input list finding the first element that should be included, and then returns that element consed with the lazy-filter of the rest of the list.

To test this out, I run this line:

(force (lazy-filter (lambda (v) (= v 1000000000)) (lazy-arithmetic-sequence 0 1)))

This is of course a rather pointless filter ("find the element with value one billion in this list from 0 to infinity"), but the point is to test the code out. The problem is that this consumes crazy amounts of memory. Within seconds its up to many gigabytes, and it shows no signs of slowing down, and I don't understand why.

I don't understand why the garbage collector doesn't reclaim the memory produced from the list. The loop in lazy-filter is tail-recursive, and there's no other references to the lazy list, so I feel like the GC should just gobble all that memory up. To make sure I even made a version that ran the garbage collector every iteration of the lazy-filter loop, and of course it didn't help.

My suspicion is that there's some reference hanging on to the head of the list that I'm not seeing. Like, the closure created by the delay in lazy-filter is somehow making the seq reference hang around, or something.

How can I rewrite this to not consume infinite amounts of memory?

I'm running Chez Scheme if that makes any difference, but I suspect that the problem is with me rather than the scheme implementation 🙂


Solution

  • Here's how to fix your problem:

    (define lazy-filter
      (lambda (pred seq)
        (delay
          (let loop ([sequence seq])
            ;; the following single line is added:   ------ NB!
            (set! seq sequence)
            (let ([forced (force sequence)])
              (cond [(null? forced) '()]
                    [(pred (car forced))
                     (cons (car forced) (lazy-filter pred (cdr forced)))]
                    [else (loop (cdr forced))]))))))

    I tried (force (lazy-filter (lambda (v) (= v 100000000)) (lazy-arithmetic-sequence 0 1))) in Racket, and it finishes up, though slowly, running in constant memory as reported by my OS, returning

    '(100000000 . #<promise:unsaved-editor:12:4>) 
    

    Without the (set! seq sequence) the memory consumption reported by OS shoots up by several gigabytes and then Racket reports that it has run out of memory and the execution is aborted.

    Some other re-writes of your code are found below, as are previous versions of this answer.


    Trying your code in Racket's debugger, we get

    enter image description here

    forced and sequence are advancing along nicely, but seq is still at the start. And no wonder, nothing is changing it.

    That's exactly as you suspected. A reference to the start of the sequence can not be released because seq is holding on to it until the result is found and returned (as the cons pair). For 100 elements it's not an issue, but for 1 billion it surely is.

    Float loop up and out of lazy-filter and the problem seems to be gone:

    enter image description here

    This code transformation technique is known as lambda lifting.

    The call to loop in lazy-filter becomes fully and manifestly tail because of it. Thanks to the tail call optimization the new call frame (for loop) can replace the old (for lazy-filter), which can now be discarded, together with its references into any data it held (here, seq).

    The debugger snapshots show what's going on when the code is being debugged. Maybe without the debugging it is compiled differently, more efficiently. Maybe A Very Smart Compiler would in effect compile it by lambda lifting so the reference to seq could be relinquished, in the first code variant just as it is in the second. Looks like your Chez Scheme though compiles it just like Racket with debugging does (note, my version of Racket is old).

    So it does seem like an implementation issue.

    You will know for sure if you try the lambda-lifted code and see whether this fixes the problem:

    (define (lazy-filter pred seq)
        (delay (lazy-filter-loop pred seq)))
    
    (define (lazy-filter-loop pred sequence)
            (let ([forced (force sequence)])
              (cond [(null? forced) '()]
                    [(pred (car forced))
                      (cons (car forced) 
                              (lazy-filter pred (cdr forced)))]
                    [else  (lazy-filter-loop pred (cdr forced))])))

    Although one could reasonably expect of Chez compiler to do this on its own. Maybe you are running interpreted code? Maybe you have the debugging information included? These are the questions to consider.

    Another way to restructure your code is

    (define lazy-filter
      (lambda (pred seq)
        (delay
          (let loop ([forced (force seq)])
              (cond [(null? forced) '()]
                    [(pred (car forced))
                      (cons (car forced) 
                              (lazy-filter pred (cdr forced)))]
                    [else  (set! seq (cdr forced))
                           (loop  (force (cdr forced)))])))))

    (the older version of the answer follows:)

    Let's see what forcing your expressions entails. I will use shorter names for your variables and functions, for more visual and immediate reading of the code.

    We'll use SSA program transformation to make a function's operational meaning explicit, and stop only on encountering a delay form.

    You don't include your delay and force definitions, but we'll assume that (force (delay <exp>)) = <exp>:

    (define (lz-seq s n)  (delay  (cons s  (lz-seq (n s) n))))
    
    (force (lz-seq s n))   =  (cons s  (lz-seq (n s) n))
    
        ;; lz-seq is a function, needs its args eval'd 
     =
        (cons s  (let* ([s2 (n s)])
                    (lz-seq s2 n)))
     =
        (let* ([s2   (n s)] 
               [lz2  (delay  (cons s2  (lz-seq (n s2) n))) ]) 
           (cons  s  lz2))
    

    We've discovered that forcing your kind of lazy sequence forces its second element as well as the first!

    (the following is incorrect:)

    And this in fact exactly explains the behavior you're observing:

    (force (lazy-filter (lambda (v) (= v 1000000000)) (lazy-arithmetic-sequence 0 1)))
        
    

    needs to find out the second element of the filtered infinite stream before it can return the first cons cell of the result, but there's only one element in the filtered sequence, so the search for the second one never ends.