I have a string like below
stringinput = Sweééééôden@
I want to get output like
stringoutput = Sweden
the spl characters ééééô
and @
has to be removed.
Am using
$stringoutput = `echo $stringinput | sed 's/[^a-z A-Z 0-9]//g'`;
I am getting result like Sweééééôden
but ééééô
is not getting removed.
Can you please suggest what I have to add
You need to use LC_ALL=C
before sed
command to make [A-Za-z]
character class create ranges as per ASCII table:
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g')
See the online demo:
stringinput='Sweééééôden@';
stringoutput=$(echo $stringinput | LC_ALL=C sed 's/[^A-Za-z0-9]//g');
echo "$stringoutput";
# => Sweden
In the default C locale, the sorting sequence is the native character order; for example, ‘[a-d]’ is equivalent to ‘[abcd]’. In other locales, the sorting sequence is not specified, and ‘[a-d]’ might be equivalent to ‘[abcd]’ or to ‘[aBbCcDd]’, or it might fail to match any character, or the set of characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the ‘C’ locale by setting the LC_ALL environment variable to the value ‘C’.
In Perl, you could simply use
my $stringinput = 'Sweééééôden@';
my $stringoutput = $stringinput =~ s/[^A-Za-z0-9]+//gr;
print $stringoutput;
See this online demo.