I am translating Xliff file using BeautifulSoup and googletrans packages. I managed to extract all strings and translate them and managed to replace strings by creating new tag with a translations, e.g.
<trans-unit id="100890::53706_004">
<source>Continue in store</source>
<target>Kontynuuj w sklepie</target>
</trans-unit>
The problem appears when the source tag has other tags inside.
e.g.
<source><x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"/>Choose your product\
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"/>From a list: </source>
There are different numbers of these tags and different order of where string appears. E.g. <source> text1 <x /> <x/> text2 <x/> text3 </source>
. Each x tag is unique with different id and attributes.
Is there a way to modify the text inside the tag without having to create a new tag? I was thinking I could extract x tags and its attributes but the order or string and x tag in different code lines differs a lot I'm not sure how to do that. Maybe there is other package better suited for translating xliff files?
You can use for
-loop to work with all children in source
.
And you can duplicate them with copy.copy(child)
and append
to target
.
At the same time you can check if child
is NavigableString
and convert it.
text = '''<source><x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"/>Choose your product\
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"/>From a list: </source>'''
conversions = {
'Choose your product': 'Wybierz swój produkt',
'From a list: ': 'Z listy: ',
}
from bs4 import BeautifulSoup as BS
from bs4.element import NavigableString
import copy
#soup = BS(text, 'html.parser') # it has problem to parse it
#soup = BS(text, 'html5lib') # it has problem to parse it
soup = BS(text, 'lxml')
# create `<target>`
target = soup.new_tag('target')
# add `<target>` after `<source>
source = soup.find('source')
source.insert_after('', target)
# work with children in `<source>`
for child in source:
print('type:', type(child))
# duplicate child and add to `<target>`
child = copy.copy(child)
target.append(child)
# convert text and replace in child in `<target>`
if isinstance(child, NavigableString):
new_text = conversions[child.string]
child.string.replace_with(new_text)
print('--- target ---')
print(target)
print('--- source ---')
print(source)
print('--- soup ---')
print(soup)
Result (little reformated to make it more readable):
type: <class 'bs4.element.Tag'>
type: <class 'bs4.element.NavigableString'>
type: <class 'bs4.element.Tag'>
type: <class 'bs4.element.NavigableString'>
--- target ---
<target>
<x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"></x>
Wybierz swój produkt
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"></x>
Z listy:
</target>
--- source ---
<source>
<x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"></x>
Choose your product
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"></x>
From a list:
</source>
--- soup ---
<html><body>
<source>
<x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"></x>
Choose your product
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"></x>
From a list:
</source>
<target>
<x ctype="x-htmltag" equiv-text="<b>" id="html_tag_191"></x>
Wybierz swój produkt
<x ctype="x-htmltag" equiv-text="</b>" id="html_tag_192"></x>
Z listy:
</target>
</body></html>