So we have header|sequence|string1|string2|directive
where string1 and string2 are arbitrary Unicode junk. Assuming the input can be really trashy Unicode (I'm expecting for it to contain things like right-to-left text, unbalanced Unicode direction control characters, etc) but not actually malicious, how can I get these strings to display in order?
The final website target is HTML but we believe it's best to process as string as far as possible. Blindly jamming a force-LTR before each |
is not remotely acceptable as this tends to carry into the text past the |
and cause RTL to render as LTR.
First step: replace control codes with control pictures
Second step: fix RTL nonsense ??
I have to admit I was expecting the RTL stack to be simpler than it was. I cannot simply run the algorithm because there's no way to know the RTL-LTR-ness of a private use character.
We ended up with this kludgy method. It works. (Note that in the production code these inline styles turn into a class reference.)
<PRE><DIV DIR=LTR STYLE="display:inline-block;">|</DIV><DIV STYLE="display:inline-block;">something1</DIV><DIV DIR=LTR STYLE="display:inline-block;">|</DIV><DIV STYLE="display:inline-block;">something2</DIV><DIV DIR=LTR STYLE="display:inline-block;">|</DIV></PRE>