I have a text file in Portuguese that I created using PHP, which contains sentences with the character "ê" (e with circumflex accent). I'm trying to read this file in Python, but I'm encountering issues specifically with the "ê" character. I have ensured that both the PHP file and Python script are using the UTF-8 encoding.
Python script work fine in terminal but when I call this python file from php exec() or shell_exec() function python could not read text file content properly and print this error:
'ascii' codec can't encode character '\xea' in position 6: ordinal not in range(128)
What could be causing this issue and how can I resolve it?
I have already tried the following steps:
operating system: Linux
Python default encoding: utf-8
text file content:
Se você tem 1 laranja e 1 limão faça esse delicioso bolo!
Python code:
filename = "newfile.txt"
with open(filename, "r", encoding="utf-8") as file:
# Read the first line of the text file
file_content = file.readline().strip()
print(file_content)
terminal print:
php file code:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<title>read python</title>
</head>
<body>
<?php
$pythonScript = "read.py";
$command = "python3 " . $pythonScript;
$output = shell_exec($command);
echo $output;
?>
</body>
</html>
I appreciate any insights or suggestions on how to handle this issue. Thank you!
The initial
\ufeff
in the ASCII string is the byte order mark (BOM) character sometimes used as a signature for a UTF-8 file. Useencoding='utf-8-sig'
to remove that. The rest of the string is correct so the problem is the encoding of the display, not Python. If your terminal isn't configured for UTF-8 it will mis-decode the result. On Windows with Python 3.11 in the command prompt a string with that content prints correctly: Se você tem 1 laranja e 1 limão faça esse delicioso bolo!.
@MarkTolonen is right, terminal was not configured for UTF-8, I set local utf-8 in terminal before using exec() function in php, now that is working.
PHP code:
$locale='pt_BR.UTF-8';
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);