rubymemory-leaksruby-2.4

Does Ruby's Regexp interpolation leak memory?


I've got code that is leaking memory in a Sinatra app on Ruby 2.4.4, and I can sort of reproduce it in irb, although it's not totally stable, and I'm wondering if others have this same problem. It happens when interpolating a large string inside a regular expression literal:

class Leak
  STR = "RANDOM|STUFF|HERE|UNTIL|YOU|GET|TIRED|OF|TYPING|AND|ARE|SATISFIED|THAT|IT|WILL|LEAK|ENOUGH|MEMORY|TO|NOTICE"*100

  def test
    100.times { /#{STR}/i }
  end
end

t = Leak.new
t.test # If I run this a few times, it will start leaking about 5MB each time

Now, if I run GC.start after this, it will usually clean up about the last 5MB (or however much it's been using), and then t.test will only use a few KB, then almost a MB, then a couple MB, then back to 5MB each time, and once again, GC.start will only collect the last 5.

An alternate way to get the same result without a memory leak is to replace /#{STR}/i with RegExp.new(STR, true). That seems to work fine for me.

Is this a legitimate memory leak in Ruby or am I doing something wrong?

UPDATE: Okay, maybe I'm misreading this. I was looking at the memory usage of the docker container after running GC.start, which would sometimes go down, but since Ruby doesn't always release memory it's not using, I guess it could just be that Ruby uses this memory, and then, even though it's not being retained, it's still not releasing the memory back to the OS. Using the MemoryProfiler gem I see that total_retained, even after running it several times is 0.

The root problem here was we had containers crashing, theoretically due to memory usage, but perhaps it's not a memory leak, but just a lack of sufficient memory to allow Ruby to consume what it wants? Are there settings for the GC to help it decide when it's time to clean up before Ruby runs out of memory and crashes?

UPDATE 2: This still doesn't make sense though - because why would Ruby continue allocating more and more memory just from running the same process over and over (why wouldn't it use the memory previously allocated)? From what I understand, the GC is designed to run at least once before allocating more memory from the OS, so why is Ruby just allocating more and more memory when I run this several times?

UPDATE 3: In my isolated test, Ruby does seem to approach a limit where it stops allocating additional memory no matter how many times I run the test (seems to usually be around 120MB), but in my production code, I haven't hit such a limit yet (it goes up past 500MB without slowing down - possibly because there are more instances of this kind of memory usage scattered around the class). There may be a limit to how much memory it would use, but it seems to be manyfold higher than one would expect to be required to run this code (which really only uses a dozen or so MB for a single run)

Update 4: I've narrowed down the test case to something that really leaks! Reading a multibyte character from a file was the key to reproducing the real problem:

str = "String that doesn't fit into a single RVALUE, with a multibyte char:" + 160.chr(Encoding::UTF_8)
File.write('weirdstring.txt', str)

class Leak
  PATTERN = File.read("weirdstring.txt").freeze

  def test
    10000.times { /#{PATTERN}/i }
  end
end

t = Leak.new

loop do
  print "Running... "

  t.test


  # If this doesn't work on your system, just comment these lines out and watch the memory usage of the process with top or something
  mem = %x[echo 0 $(awk '/Private/ {print "+", $2}' /proc/`pidof ruby`/smaps) | bc].chomp.to_i
  puts "process memory: #{mem}"
end

So... this is a real leak, right?


Solution

  • It was a memory leak!

    https://bugs.ruby-lang.org/issues/15916

    Should be fixed in one of the next releases of Ruby (2.6.4 or 2.6.5?)