htmlutf-8special-characterszopezope3

Error message when trying to edit a file : "The character set specified in the content type (UTF-8) does not match file content."


I have a Zope 3 framework with an interface that allows to edit directly the content of files from the browser.

Everything was working fine up to now. Unfortunately, on some files, I can't edit them anymore, I have the following error message :

"The character set specified in the content type (UTF-8) does not match file content." 

Below a capture :

example of error message

For example, I managed to download via ftp a file that I can' edit and show the header below :

<meta http-equiv="Content-Type" content="text/html; charset="utf-8"" />
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" />
<meta name="originator" content="TeX4ht (http://www.tug.org/tex4ht/)" />
<!-- 3,html,xhtml,charset="utf-8" -->
<meta name="src" content="content_final.tex" />
<link rel="stylesheet" type="text/css" href="content_final.css" />
 <script type="text/javascript" src="./jquery.js">
</script>

In the following of the content (the body), I have special character like this :

<br />&#x00A0;<span class="sectionToc" >6.5 <a
href="section32.html#x40-2480006.5" id="QQ2-40-259">Déplacement le long d&#8217;une courbe</a></span>
<br />&#x00A0;<span class="sectionToc" >6.6 <a
href="section33.html#x41-2520006.6" id="QQ2-41-268">Tenseur de Riemann-Christoffel</a></span>

I wonder if the issue may come from these special characters : &#x00A0; , &#8217;.

What do you think about this ? Is utf-8 not respected in my HTML file ? How to fix this error to be able to edit it directly from the browser ?

Tell me please what I have to add or insert in my imported HTML pages, or the command (I am using vim / Debian GNU/LInux) to apply on them in order to have only Unicode characters and be fully compatible.


Solution

  • I give you here my opinion, based on the information that you provide.

    It seems that there is a source encoding problem.
    Every text file has its own encoding.
    When characters beyond the basic ASCII set are represented, many incompatible encodings are in use.
    Nowadays, Unicode encodings are prefirable, since the Unicode set of characters comprehend all characters present in any preexistent encoding in the world.

    If you write a character, your text editor saved it internally as a code number, based in a previously specified encoding. If the encoding used to save the file is different than the one expected by another application, the character is not properly recognized.

    In general, the UTF-8 encoding (as defined by Unicode) is used in all modern projects.
    Hence, you should ensure that all your source files are stored internally as UTF-8 files.

    I understand that you are able to edit the body file, as you called.
    In such a case, you could try to open the file in your text editor, and then change its encoding to UTF-8, thus matching the encoding of the ftp file that you cannot edit.
    Every modern code-oriented text editor should be able of letting you to choose among all well known encodings in a handy way.

    Then, of course, you would save your file.

    Another approach could be to rewrite the file so that all characters in it have Unicode codepoint below 0x80, since these codepoints are compatible with any ASCII-friendly encoding, as the very commonly used ISO-8859-1 encoding, say.

    Since your French character é does not belong to the ASCII set, you might rewriting it by using the ampersand syntax of HTML, in this way:

    &#x00E9;  
    

    The hexadecimal number 00E9 is the decimal 233, which is the codepoint corresponding to the character: é (Latin Small Letter E with Acute).
    Thus, your text will look like this:

    D&#x00E9;placement le long d&#8217;une courbe  
    

    If you prefer to use decimal code numbers, then write:

    D&#0233;placement le long d&#8217;une courbe  
    

    To help in the research of Unicode codepoints of characters and other properties, you can use the following websites:

    1. unicode-table.com
    2. amp-what.com/

    ADDED

    The OP told in the comments that he actually needed to save the files to UTF-8 by using vim.
    Here is the command for vim:

    :set fileencoding=utf-8