Since the recent 4.8.0, SICStus supports Unicode above 每
within Prolog text at least in a quoted context, that is within a quoted token (* 6.4.2 *)
, a double quoted list (* 6.4 *)
and a character code constant (* 6.4.4 *)
. As a consequence also unicorn syntax is supported, like
| ?- U='\x1f984\'. % teh olde way
U = '馃' ? ;
no
| ?- U = '馃'.
U = '馃' ? ;
no
Similarly, [user]
supports unicorns directly! However, when opening a file with open/3
only a BOM-ed file or an explicit option with open/4
sets the encoding to UTF-8. Says the meanual
The default is 'ISO-8859-1' if no encoding is specified and no encoding can be detected from the file contents.
(And it seems that not the entire file content is used for that detection.) Is there a way to change that default to UTF-8? Such that I can use UTF-8 everywhere?
There is no way to change the default encoding used by open/3
et al., currently (SICStus Prolog 4.8.0).
However, if your text file starts with an Emacs-style -*- coding:utf-8; -*-
line, then it will be used for setting the encoding used when reading the file. This works for any text file, including Prolog source files.
By default, SICStus also sets the encoding according to a Unicode BOM, also for UTF-8, if a BOM is present.
See the documentation for details.
Update: So, when opening a file, SICStus first reads a few bytes to see if there is a BOM. If not, it reads the first line and looks for an Emacs style -*- ... -*-
. One thing the SICStus documentation does not mention is that the first line is only looked for in the first 1024 bytes of the file, so even if there are no newlines, the whole file is not read when opened. This is how Emacs does it, even though I haven't seen it documented.