I'm working on Ubuntu 16.04 (Xenial Xerus). I found out that text editors write additional bytes (UTF-8) to the text file. It made some problems for me, when I tried to pass tests.
So we have a string, "Extra byte", with the size = 10 bytes in UTF-8. When I try to write it in file by gedit, for example, I get a file with the size = 11 byte. Also, nano makes the same size. Even "echo "Extra byte" > filename" returns 11 bytes.
However, when we try something like this:
#include <fstream>
int main(){
std::ofstream file("filename");
file<<"Extra byte";
return 0;
}
or this:
with open("filename_py",'w+',encoding='UTF-8') as file:
file.write('Extra byte')
We get the file with size = 10 bytes. Why?
You are seeing a newline character (often expressed in programming languages as \n
, in ASCII it is hex 0a, decimal 10):
$ echo 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f0a foo.
The hex-dump tool xxd
shows that the file consists of 4 bytes, hex 66 (ASCII lowercase f), two times hex 65 (lowercase letter o) and the newline.
You can use the -n
command-line switch to disable adding the newline:
$ echo -n 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f foo
or you can use printf
instead (which is more POSIX compliant):
$ printf 'foo' > /tmp/test.txt
$ xxd /tmp/test.txt
00000000: 666f 6f foo
Also see 'echo' without newline in a shell script.
Most text editors will also add a newline to the end of a file; how to prevent this depends on the exact editor (often you can just use delete at the end of the file before saving). There are also various command-line options to remove the newline after the fact, see How can I delete a newline if it is the last character in a file?.
Text editors generally add a newline because they deal with text lines, and the POSIX standard defines that text lines end with a newline:
3.206 Line
A sequence of zero or more non-<newline>
characters plus a terminating<newline>
character.