I have a file:
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,
żeby
było śmieszniej, haha.
ą
a
Example gawk:
gawk '{printf "%-80s %-s\n", $0, length}' file
In gawk, I get the correct result:
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8, 73
żeby 5
było śmieszniej, haha. 22
ą 1
a 1
In gawk, I get the correct result:
Example mawk:
mawk '{printf "%-80s %-s\n", $0, length}' file
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8, 80
żeby 6
było śmieszniej, haha. 24
ą 2
a 1
In mawk, I get the incorrect result:
As mawk get the same result as gawk?
mawk is a minimal-featured awk designed for speed of execution over functionality. You should not expect it to behave exactly the same as gawk or a POSIX awk. If you're going to use mawk, you need to get a mawk manual describing how IT behaves, don't rely on any other documentation describing how other awks behave.
IMHO there is no correct result for the formatting string %-s
as it is meaningless to align a string without specifying a width within which to align it. There's also different interpretations of what length
means on it's own - it could be short-hand for length($0)
or it could be something else in a non-POSIX awk, there might not even be a length function in some non-POSIX awk and so it might take that as an undefined variable name. How does any given awk handle non-English characters?
As I said - if you're going to use a non-POSIX awk, you need to check the manual for THAT awk for all of the gory details...