perluppercaseucfirst

Capitalizing words by uppercasing only the first letter


In Perl, there's the ucfirst function.

Is it this the equivalent to this:

sub uppercase {     
    my ($W) = @_;       
    $$W = uc(substr($$W,0,1)).substr($$W,1);        
}

Does it matter across Perl version?


Contextualizing the question, https://github.com/moses-smt/mosesdecoder/pull/206/files#diff-876e51db2a1ab71c1ae736182d1e5e04R63 ,

Previously, the usage of uppercase is as such:

sub process {
    my $line = $_[0];
    chomp($line);
    $line =~ s/^\s+//;
    $line =~ s/\s+$//;
    my @WORD  = split(/\s+/,$line);

    # uppercase at sentence start
    my $sentence_start = 1;
    for(my $i=0;$i<scalar(@WORD);$i++) {
      &uppercase(\$WORD[$i]) if $sentence_start;
      if (defined($SENTENCE_END{ $WORD[$i] })) { $sentence_start = 1; }
      elsif (!defined($DELAYED_SENTENCE_START{$WORD[$i] })) { $sentence_start = 0; }
    }

    # uppercase headlines {
    if (defined($SRC) && $HEADLINE[$sentence]) {
        foreach (@WORD) {
            &uppercase(\$_) unless $ALWAYS_LOWER{$_};
        }
    }

But it seems like replacing &uppercase(\$WORD[$i]) and &uppercase(\$_) with ucfirst(\$WORD[$i]) and ucfirst(\$_) is different.


Solution

  • ucfirst is not equivalent to the following:

    sub uppercase {     
        my ($W) = @_;       
        $$W = uc(substr($$W,0,1)).substr($$W,1);        
    }
    

    ucfirst is mostly[1] equivalent to the following:

    sub ucfirst {     
        my ($W) = @_;       
        return uc(substr($W,0,1)).substr($W,1);        
    }
    

    If you wanted to rewrite uppercase in terms of ucfirst, it would look like this:

    sub uppercase {     
        my ($W) = @_;
        $$W = ucfirst($$W);    
    }
    
    uppercase(\$string);
    

    That means that if you wanted to eliminate uppercase entirely, you'd replace

    uppercase(\$string);
    

    with

    $string = ucfirst($string);     # Correct
    

    You tried using

    ucfirst(\$string);              # Wrong
    

    1. ucfirst actually does a better job of handling more esoteric characters such as U+01F3 LATIN SMALL LETTER DZ ("dz").