I have a Perl tied hash which uses the SDBM_File
module and I need to do some character encoding conversions when storing or fetching values.
I followed the documentation of perldbmfilter
and in general it seems to work: I get the results from the hash properly encoded like expected, and it stores the byte values for my encoding in the file, which I check using a hex editor.
What is not working is any kind of iteration over all elements of the hash, either using keys
, or each
, or things like Data::Dumper
, or even a simple copy operation to another hash.
What I always get is an infinite recursion. It seems like the iterator never gets to its end, and if I use each
and print the iterated values they are repeated.
I hunted the problem down to the use of filter_fetch_key
and the charset conversion I'm doing there. If I comment out filter_fetch_key
, or if I change the filter method to do just return shift
, then the iteration works again. Either of these solves my problem, but I need to use filter_fetch_key
to send the caller properly-encoded strings.
$dbm->filter_fetch_key (sub { $_ = $self->_normalizeCharset($_); });
sub _normalizeCharset
{
my $self = shift || Carp::croak(...);
#return shift;
return ...::windows2utf(shift);
}
If I uncomment return shift
, the iteration works; but commented like above it doesn't. I guess it has something to do what happens to $_
, but I have no idea, as windows2utf
just copies the given data and does some character encoding. This works the same way for storing keys and values and even for fetching values. Only keys are the problem and only if I do an iteration, not if I directly ask for specific keys.
Any hints on what I'm doing wrong?
There's a thread on Perlmonks
as well.
I have found the issue: During the tests I recognized that filter_fetch_key
calls my function once with a key value of undef
and my functions return an empty string for convenience in that case. This seems to be what's causing the infinite loop, I guess someone wants to add that empty string as a new key to the hash for some reason and runs into problems with some iterator which gets invalid or such. The interesting part about that is that changes to keys should be perfectly fine, because one of the examples in the documentation is exactly about that and my tests show that I can replace each key except the undef
one which whatever I like, I can create completely new keys by prepending __ for example. No problem, only if I don't return undef
for undef
I get the infinite loop. The first version of the following method works, the second does not.
sub _normalizeCharset
{
my $self = shift || Carp::croak(...);
my $value = shift;
my $key = shift || 0;
return undef unless (defined($value));
#return '' unless (defined($value));
$value = "__$value" if ($key);
return ...::windows2utf($value);
}
sub _normalizeCharset
{
my $self = shift || Carp::croak(...);
my $value = shift;
my $key = shift || 0;
#return undef unless (defined($value));
return '' unless (defined($value));
$value = "__$value" if ($key);
return ...::windows2utf($value);
}
I guess undef
may be some special signal of the filter to indicate the end of iterated keys or such and simply is not meant as a regular hash key. At least I don't put it anywhere.