I have text objects (text1
, text2
, etc) with formatting information (eg. [bold, italic]
). Now I want to concat these texts and format them in HTML.
For a simple case it would transform the following
text1[bold,italic]
text2[]
text3[bold]
into this HTML:
<b><i>text1</i></b>
text2
<b>text3</b>
But I want to join formattings where possible, not format each text individually. For example the following
text1[bold,italic]
text2[bold]
text3[]
Should result in this HTML:
<b>
<i>text1</i>
text2
</b>
text3
Additionally I want to wrap the "longest" formatting to the "outside" of the DOM. For example in the following case, italics
is the longer formatting chain and thus wraps around the bold
element:
text1[bold,italic]
text2[italic]
text3[]
Result:
<i>
<b>text1</b>
text2
</i>
text3
What would be a good algorithmic approach to solve this problem? Answers in pseudocode or any programming language would be helpful.
Here is the solution I used in the end. Thanks for all input here and elsewhere. It got me on the right track.
The solution has two parts.
To output the tags, a stack based approach is used. You loop over the items and close all formattings that are not in the current item (removing the formatting and all intermediaries from the stack):
// close all formatting that is not in the current text
$toclose = array_diff($fStack, $formatting);
foreach($toclose as $f) {
// we need to make sure all formatting is closed, but we close by popping the
// stack. This ensures we don't create invalid nesting
while (in_array($f, $fStack)) {
$result .= $this->closeFormatting(array_pop($fStack));
}
}
Then we open all formatting that is still missing on the stack and add it to the stack:
// open formatting that is in the current text but not yet open
$new = array_diff($formatting, $fStack);
foreach ($new as $f) {
$result .= $this->openFormatting($f);
$fStack[] = $f;
}
Finally you can output the text itself.
$result .= $text->__toString();
```php
After the loop close all remaining formattings on the stack.
```php
// close remaining formatting
while ($fStack) {
$result .= $this->closeFormatting(array_pop($fStack));
}
The above approach will already work, but the tags might not be ideally nested. Eg. for
text1[bold,italic]
text2[italic]
text3[]
It might produce <b><i>text1</i></b><i>text2</i>text3
. It depends on which formatting you found first in $formatting
for the first item.
To solve this problem, we need to sort the formatting information by longest chain.
Eg. the above information should be:
text1[italic, bold]
text2[italic]
text3[]
To achieve that, I do a simple chain sorting step beforehand. You go backwards through the texts (starting at the second to last) and give it the information of how many same formattings come afterwards. You then sort the formatting by chain length (highest first).
The above would end up as something like this:
text1[italic:2, bold:1]
text2[italic:1]
text3[]
loop:
function updateFormattingScores()
{
$len = count($this->texts);
if($len < 2) return;
for($i = $len - 2; $i >= 0; $i--) {
$this->texts[$i]->updateFormattingScores($this->texts[$i + 1]);
}
}
update:
public function updateFormattingScores(TextRun $nextRun)
{
$next = $nextRun->getFormatting();
foreach ($next as $key => $value) {
if($this->formatting[$key] === 0) continue; // we don't have this format
$this->formatting[$key] += $value; // add follow up chain
}
// sort by value, longest chains first
arsort($this->formatting);
}
With this run before the output, the output will always find the longer chain formatting first.