rutf-8latexiso-8859-1tinytex

tinytex warnings due to T1 fontenc and latin1 log file


Problem

I run tinytex::latexmk() on a file with (a) UTF-8 encoding, (b) T1 font encoding, (c) overfull hbox containing an accented character. This creates a log file in latin1 (ISO-8859-1) encoding, causing warnings from tinytex.

Reproducible example

tinytex::latexmk("tiny.tex")
## [1] "tiny.pdf"
## Warning messages:
## 1: In xfun::read_utf8(log) :
##   The file tiny.log is not encoded in UTF-8. These lines contain invalid UTF-8 characters: 51
## 2: In grep("^(LaTeX|Package [[:alnum:]]+) Warning:", x) :
##   unable to translate 'r/bx/n/10 vari-able di-cot<f3>mica\T1/cmr/m/n/10 . ' to a wide string
## 3: In grep("^(LaTeX|Package [[:alnum:]]+) Warning:", x) :
##   input string 51 is invalid

where tiny.tex contains

\documentclass[a4paper]{article}
\usepackage[T1]{fontenc}
\begin{document}
\begin{itemize}
\item In this line we get an overfull hbox because the word \textbf{variable dicotómica}.
\end{itemize}
\end{document}

System information

R 4.3.1 using tinytex 0.46 and xfun 0.40 with TeXLive 2023.20230613-3 on a Debian/GNU Linux (testing) system, running an en_US.UTF-8 locale. The TinyTeX distribution is not installed, i.e., the R package tinytex calls the system TeXLive. We have seen the same kind of problem on Windows systems as well, though.

Workaround

I can avoid the issue by not using the T1 font encoding, i.e., by omitting or commenting the second line in tiny.tex. Then OT1 font encoding is used and the log message about the overfull hbox is in ASCII.

Question

Is there anything I can do to obtain a log message in UTF-8? Or would tinytex have to deal with it (e.g., by reading bytes and then using iconv())?


Solution

  • This is a bug in the R package tinytex, which has been fixed in v0.48. Installing the latest version from CRAN should fix the problem:

    install.packages('tinytex')
    

    (Technical background and partial answer based on off-list feedback from Sebastian Meyer:)

    The log message is not actually latin1-encoded. It rather contains a representation of the characters used for typesetting which may be interpreted by text editors and similar tools as being latin1. In pdfLaTeX this cannot be avoided (only in LuaLaTeX or XeLaTeX).

    The details are explained in {TeX}: What controls the encoding of the LaTeX log file – and how to change it? in the answer by Enrico Gregorio and further comments by David Carlisle. In this discussion it is recommended that the log files should be read as bytes (and not as UTF-8-encoded files).