[SOLVED] The differences between gawk and mawk (column width)

The differences between gawk and mawk (column width)

I have a file:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8, 
żeby 
było śmieszniej, haha.
ą
a

Example gawk:

gawk '{printf "%-80s %-s\n", $0, length}' file

In gawk, I get the correct result:

To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,         73
żeby                                                                             5
było śmieszniej, haha.                                                           22
ą                                                                                1
a                                                                                1

In gawk, I get the correct result:

Example mawk:

mawk '{printf "%-80s %-s\n", $0, length}' file
To jest długi string z wieloma polskimi literami ąółżęś kodowany w UTF8,  80
żeby                                                                            6
było śmieszniej, haha.                                                         24
ą                                                                               2
a                                                                                1

In mawk, I get the incorrect result:

As mawk get the same result as gawk?

Solution

mawk is a minimal-featured awk designed for speed of execution over functionality. You should not expect it to behave exactly the same as gawk or a POSIX awk. If you're going to use mawk, you need to get a mawk manual describing how IT behaves, don't rely on any other documentation describing how other awks behave.

IMHO there is no correct result for the formatting string %-s as it is meaningless to align a string without specifying a width within which to align it. There's also different interpretations of what length means on it's own - it could be short-hand for length($0) or it could be something else in a non-POSIX awk, there might not even be a length function in some non-POSIX awk and so it might take that as an undefined variable name. How does any given awk handle non-English characters?

As I said - if you're going to use a non-POSIX awk, you need to check the manual for THAT awk for all of the gory details...