pdfditadita-ot

DITA OT printing '#' in stead of Chinese characters in PDF


I am very new to DITA OT. Downloaded the DITA-OT1.5.4_full_easy_install_bin and playing around with it. I'm trying to print few characters in Simplified Chinese (zh-CN) into a PDF. I see that the characters are printed correctly in XHTML but in PDF they are printed as "#".

In the command line I see this - "Warning: Glyph "?" (0x611f) not available in font "Helvetica".

Here are the things I have tried so far:

In demo\fo\fop\conf\fop.xconf :

<fonts>
     <font kerning="yes" 
         embed-url="file:///C:/Windows/Fonts/simsun.ttc" 
         embedding-mode="subset" encoding-mode="cid">
             <font-triplet name="SimSun" style="normal" weight="normal"/>
     </font>
     <auto-detect/>
     <directory recursive="true">C:\Windows\Fonts</directory>
</fonts>

In demo\fo\cfg\fo\attrs\custom.xsl :

<xsl:attribute-set name="__fo__root">
     <xsl:attribute name="font-family">SimSun</xsl:attribute>
</xsl:attribute-set>

In demo\fo\cfg\fo\font-mapping.xml added this block for Sans, Serif & Monospaced logical fonts:

<physical-font char-set="Simplified Chinese">
     <font-face>SimSun</font-face>
</physical-font>

In samples\concepts\garageconceptsoverview.xml :

<shortdesc xml:lang="zh_CN">職業道德感.</shortdesc>
And this is the command I am using to generate the PDF:
ant -Dargs.input=samples\hierarchy.ditamap -Dtranstype=pdf

Any help would be appreciated. Thanks.

[EDIT] I see that the topic.fo file which gets generated in temp folder, does contain the Chinese characters correctly. Like this:

<fo:block font-size="10pt" keep-with-next.within-page="5" start-indent="25pt">職業道德感.</fo:block>

But I do not see the font related information anywhere in this document.


Solution

  • First of all you should set the "xml:lang='zh_CN'" attribute on the root elements for all DITA topics and maps. This will help the DITA OT publishing decide the language to use for static texts like "Table X" and also to decide on the charset to use for the font mappings. Then you should run the publishing by setting the parameter "clean.temp" parameter to "no". After the publishing you can look in the temporary files folder for a file called "topic.fo" and look inside it to see what font families are used. Because even if you set a font on the root element, there are other places in the XSL-FO file where you have font families set explicitly. So instead of setting a font on the XSL-FO root element you should edit the font mappings XML file and for each of the logical fonts "Sans" and "Serif" you should configure the actual font family to use for the Chinese charset, something like:

    <logical-font name="Sans">
    .........
      <physical-font char-set="Simplified Chinese">
        <font-face>SimSun</font-face>
      </physical-font>
      ......
    </logical-font>
    

    More about how the font mappings work:

    https://www.oxygenxml.com/doc/versions/17.0/ug-editor/#topics/DITA-map-set-font-Apache-FOP.html

    Update: If you insist of having that XSLT customization which sets the "SimSun" font as a font family on the root element, then in the font-mappings.xml you need to define a new mapping for your alias:

    <aliases>
      <alias name="SimSun">SimSun</alias>
    </aliases>
    

    and then map the logical font to a physical one in the same font-mappings.xml:

    <logical-font name="SimSun">
      <physical-font char-set="Simplified Chinese">
        <font-face>SimSun</font-face>
      </physical-font>
    </logical-font>