perlrecursionhashdbm

Infinite recursion for iterator of tied hash with perldbmfilter for filter_fetch_key


I have a Perl tied hash which uses the SDBM_File module and I need to do some character encoding conversions when storing or fetching values.

I followed the documentation of perldbmfilter and in general it seems to work: I get the results from the hash properly encoded like expected, and it stores the byte values for my encoding in the file, which I check using a hex editor.

What is not working is any kind of iteration over all elements of the hash, either using keys, or each, or things like Data::Dumper, or even a simple copy operation to another hash.

What I always get is an infinite recursion. It seems like the iterator never gets to its end, and if I use each and print the iterated values they are repeated.

I hunted the problem down to the use of filter_fetch_key and the charset conversion I'm doing there. If I comment out filter_fetch_key, or if I change the filter method to do just return shift, then the iteration works again. Either of these solves my problem, but I need to use filter_fetch_key to send the caller properly-encoded strings.

$dbm->filter_fetch_key  (sub { $_ = $self->_normalizeCharset($_); });

sub _normalizeCharset
{
  my $self = shift || Carp::croak(...);

  #return shift;
  return ...::windows2utf(shift);
}

If I uncomment return shift, the iteration works; but commented like above it doesn't. I guess it has something to do what happens to $_, but I have no idea, as windows2utf just copies the given data and does some character encoding. This works the same way for storing keys and values and even for fetching values. Only keys are the problem and only if I do an iteration, not if I directly ask for specific keys.

Any hints on what I'm doing wrong?

There's a thread on Perlmonks as well.


Solution

  • I have found the issue: During the tests I recognized that filter_fetch_key calls my function once with a key value of undef and my functions return an empty string for convenience in that case. This seems to be what's causing the infinite loop, I guess someone wants to add that empty string as a new key to the hash for some reason and runs into problems with some iterator which gets invalid or such. The interesting part about that is that changes to keys should be perfectly fine, because one of the examples in the documentation is exactly about that and my tests show that I can replace each key except the undef one which whatever I like, I can create completely new keys by prepending __ for example. No problem, only if I don't return undef for undef I get the infinite loop. The first version of the following method works, the second does not.

    sub _normalizeCharset
    {
      my $self  = shift || Carp::croak(...);
      my $value = shift;
      my $key = shift || 0;
    return undef unless (defined($value));
    #return '' unless (defined($value));
    
    $value = "__$value" if ($key);
    
      return ...::windows2utf($value);
    }
    
    
    sub _normalizeCharset
    {
      my $self  = shift || Carp::croak(...);
      my $value = shift;
      my $key = shift || 0;
    #return undef unless (defined($value));
    return '' unless (defined($value));
    
    $value = "__$value" if ($key);
    
      return ...::windows2utf($value);
    }
    

    I guess undef may be some special signal of the filter to indicate the end of iterated keys or such and simply is not meant as a regular hash key. At least I don't put it anywhere.