I am currently working on debugging some of the DirectWrite code I have written, as I have run into issues when testing with non-English characters. Mostly with getting multiple Unicode characters returning proper indices.
EDIT: After further research, I believe the issue is diacritics, the extra character should be combined somehow. The DWRITE_SHAPING_GLYPH_PROPERTIES
field isDiacritic
does return 1 for the last unicode codepoint. However, it doesn't seem like the shaping process takes these into account at all. GetGlyphPlacements
returns 0's for advance and offset for the diacritic glyph. The LSB is around -5 but that's not enough to offset to the correct position. Does anyone know where in the shaping process DirectWrite is supposed to take diacritics into account and how?
Consider this character: œ̃
It is displayed as one character (through most text editors), but two codepoints: U+0153 U+0303
How do I account for this in GetGlyphs()
, since they are separate codepoints? In my code, it is returning two different indices (177, 1123)
, and one cluster (0, 0)
.
This is what ends up getting rendered:
Which is consistent with both codepoints rendered individually, but not the actual character. The actual indice count returned by GetGlyphs()
is 2.
My questions are as follows:
Should this be returning one indice from GetGlyphs()
?
Should I even be getting one indice, or is there some magic involved with two different indices, where at some stage in the process they are combined in the glyph run?
If I should be getting one indice, what process/functions are these indices combined at? Perhaps a bug in my ScriptAnalysis
? Trying to narrow down where the issue may be.
Should I be using the length of the characters and not include codepoints?
I apologize as I am not super knowledgeable about fonts/Unicode and the inner workings of the whole shaping process.
Here is some of my code for the process I use to get the indices and advances:
text_length = len(text.encode('utf-16-le')) // 2
text_buffer = create_unicode_buffer(text, text_length)
self._text_analysis.GenerateResults(self._analyzer, text_buffer, len(text_buffer))
# Formula for text buffer size from Microsoft.
max_glyph_size = int(3 * text_length / 2 + 16)
length = text_length
clusters = (UINT16 * length)()
text_props = (DWRITE_SHAPING_TEXT_PROPERTIES * length)()
indices = (UINT16 * max_glyph_size)()
glyph_props = (DWRITE_SHAPING_GLYPH_PROPERTIES * max_glyph_size)()
actual_count = UINT32()
self._analyzer.GetGlyphs(text_buffer,
len(text_buffer),
self.font.font_face,
False, # sideways
False, # rtl
self._text_analysis.script, # scriptAnalysis
None, # localName
None, # numberSub
None, # typo features
None, # feature range length
0, # feature range
max_glyph_size, # max glyph size
clusters, # cluster map
text_props, # text props
indices, # glyph indices
glyph_props, # glyph pops
byref(actual_count) # glyph count
)
advances = (FLOAT * length)()
offsets = (DWRITE_GLYPH_OFFSET * length)()
self._analyzer.GetGlyphPlacements(text_buffer,
clusters,
text_props,
text_length,
indices,
glyph_props,
actual_count,
self.font.font_face,
self.font.font_metrics.designUnitsPerEm,
False, False,
self._text_analysis.script,
self.font.locale,
None,
None,
0,
advances,
offsets)
EDIT: Here is rendering code:
def render_single_glyph(self, font_face, indice, advance, offset, metrics):
"""Renders a single glyph using D2D DrawGlyphRun"""
glyph_width, glyph_height, lsb, font_advance = metrics
# Slicing an array turns it into a python object. Maybe a better way to keep it a ctypes value?
new_indice = (UINT16 * 1)(indice)
new_advance = (FLOAT * 1)(advance)
run = self._get_single_glyph_run(font_face,
self.font._real_size,
new_indice, # indice,
new_advance, # advance,
pointer(offset), # offset,
False,
False)
offset_x = 0
if lsb < 0:
# Negative LSB: we shift the layout rect to the right
# Otherwise we will cut the left part of the glyph
offset_x = math.ceil(abs(lsb))
font_height = (self.font.font_metrics.ascent + self.font.font_metrics.descent) * self.font.font_scale_ratio
# Create new bitmap.
self._create_bitmap(int(math.ceil(glyph_width)),
int(math.ceil(font_height)))
# This offsets the characters if needed.
point = D2D_POINT_2F(offset_x, int(math.ceil(font_height)))
self._render_target.BeginDraw()
self._render_target.Clear(transparent)
self._render_target.DrawGlyphRun(point,
run,
self.brush,
DWRITE_MEASURING_MODE_NATURAL)
self._render_target.EndDraw(None, None)
image = wic_decoder.get_image(self._bitmap)
glyph = self.font.create_glyph(image)
glyph.set_bearings(self.font.descent, offset_x, round(advance * self.font.font_scale_ratio)) # baseline, lsb, advance
return glyph
Shaping process is controlled by your input which is (text,font,locale,script,user features). All that affects results you get. To answer your questions specifically:
Should this be returning one indice from GetGlyphs()?
That's mostly defined by your font.
Should I even be getting one indice, or is there some magic involved with two different indices, where at some stage in the process they are combined in the glyph run?
GetGlyphs() operates on single run. Glyphs are free to form a cluster according to shaping rules defined per-script, and according to transformations defined in the font.
If I should be getting one indice, what process/functions are these indices combined at? Perhaps a bug in my ScriptAnalysis? Trying to narrow down where the issue may be.
Basically, if your input arguments are correct, you get what you get as output, you can't really control the core of it. What you can do is to test output for the same text and font on Uniscribe, on CoreText (macos), and on Chromium/Firefox (harfbuzz) to see if they differ.
Should I be using the length of the characters and not include codepoints?
I didn't get this one.