I have a lot of data frame such as [COMPANY]
in my html text file which I want exclude while Deepl translating my text. I use Deepl Java lib with api and not allowed to change the data frame format.
Any Idea how to exclude df[TEXT] from translation?
Example text:
Dear client,
Please find enclosed [EVENT] for the order you wish to execute for your account [ACCOUNT_NAME_TEXT].
Kind regards,
[COMPANY_NAME]
html file
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<p>Dear client,</p>
<p>Please find enclosed the Events for the order you wish to execute for your account [ACCOUNT_NAME_TEXT].</p>
<p> </p>
<p>Kind regards,</p>
<p>[COMPANY_NAME]</p>
</body>
</html>
For now, I solved it by parsing my df[TEXT]
to ignore tag before translating and setting it back to the original. see the below method, it may help someone with the same request.
private static final String BEGIN_IGNORE_TAG = "<loveIgnoreTag>";
private static final String END_IGNORE_TAG = "</loveIgnoreTag>";
public String translate( String source , String target, String text )
throws DeepLException, InterruptedException
{
//https://www.deepl.com/docs-api/xml/ignored-tags/
ArrayList<String> ignoreTags = new ArrayList<>( ) ;
ignoreTags.add( "loveIgnoreTag" );
text = parseToIgnoreTage(text);
TextTranslationOptions translationOptions = new TextTranslationOptions( )
.setTagHandling( "xml" )
.setFormality( Formality.PreferMore )
.setPreserveFormatting( true )
.setIgnoreTags( () -> ignoreTags.iterator( ) )
.setSentenceSplittingMode( SentenceSplittingMode.All );
TextResult result = translator.translateText( text, source, target, translationOptions );
String translationResult = parseToDataFrame(result.getText( ));
return translationResult;
}
private String parseToIgnoreTage( String text )
{
text = text.replace( "[", BEGIN_IGNORE_TAG ).replace( "]", END_IGNORE_TAG );
return text;
}
private String parseToDataFrame( String result )
{
result = result.replace(BEGIN_IGNORE_TAG,"[" ).replace( END_IGNORE_TAG, "]" );
return result;
}