I have a Java 17 application, on Windows OS, configured with Gradle, to unmarshall files into Java classes via .xsd scheme, which contains UTF-8 symbols, specifically, Russian symbols. It was unable to generate classes with original UTF-8 names, but I found the way to force encoding during generation, it allowed me just to see my classes and compile the app. But in runtime I found out that the real problem is Unicode symbols in JAXB's annotations like @XmlElement, it renders the whole unmarhalling process impossible. The app says the file is incorrect starting from the first symbol. Example of generating:
@XmlElement(name = "\u041e\u0442\u043a\u0440\u044b\u0442\u0438\u0435\u0421\u0447\u0435\u0442\u043e\u0432")
protected ОткрытиеСчетов открытиеСчетов;
The name in clauses supposed to be "ОткрытиеСчетов".
Fragment of my build.gradle file responsible for JAXB generation task is
configurations {
jaxb
}
dependencies {
implementation("org.glassfish.jaxb:jaxb-runtime:4.0.2")
implementation("jakarta.xml.bind:jakarta.xml.bind-api:4.0.2")
jaxb "com.sun.xml.bind:jaxb-xjc:4.0.2"
jaxb "org.glassfish.jaxb:jaxb-runtime:4.0.2"
jaxb "jakarta.xml.bind:jakarta.xml.bind-api:4.0.2"
}
tasks.register('xjcGenerate') {
def generatedSourcesDir = file("$buildDir/generated/sources/xjc")
outputs.dir generatedSourcesDir
doLast {
ant.taskdef(
name: 'xjc',
classname: 'com.sun.tools.xjc.XJCTask',
classpath: configurations.jaxb.asPath
)
ant.xjc(
destdir: generatedSourcesDir,
package: 'com.example.generated'
) {
schema(dir: 'src/main/resources/xsd', includes: '*.xsd')
arg(value: "-encoding")
arg(value: "UTF-8")
}
}
}
compileJava.dependsOn xjcGenerate
compileJava.options.encoding = 'UTF-8'
Example .xsd element, used for generation: <xs:complexType name="ОткрытиеСчетов">
Actual error during unmarshalling is
jakarta.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
UPD: after fixes to build.gradle, the problem was in xml encoding, according to this answer, UTF-8 and UTF-8-BOM are not same, changing file encoding to UTF-8 helped!
I think your problem is that while you managed to generate Java classes using UTF-8 encoding, JAXB annotations such as @XmlElement(name = "...") contain Unicode escapes (\uXXXX) instead of actual UTF-8 characters. Which breaks with non-latin chars.
Try to force xjc to output native characters instead of Unicode escapes. By default, the xjc task writes annotations using Unicode escape sequences rather than native characters.
Fot that you must launch xjc with proper JVM file encoding, e.g.:
In your build.gradle add this :
configurations {
jaxb
}
dependencies {
implementation("org.glassfish.jaxb:jaxb-runtime:4.0.2")
implementation("jakarta.xml.bind:jakarta.xml.bind-api:4.0.2")
jaxb "com.sun.xml.bind:jaxb-xjc:4.0.2"
jaxb "org.glassfish.jaxb:jaxb-runtime:4.0.2"
jaxb "jakarta.xml.bind:jakarta.xml.bind-api:4.0.2"
}
def xsdDir = "$projectDir/src/main/resources/xsd"
def generatedDir = "$buildDir/generated/sources/xjc"
tasks.register('xjcGenerate', JavaExec) {
group = "build"
description = "Generate JAXB classes from XSD"
classpath = configurations.jaxb
mainClass.set("com.sun.tools.xjc.XJCFacade")
// Pass all args to XJC
args = [
"-d", generatedDir,
"-p", "com.example.generated",
"-encoding", "UTF-8",
"-no-header",
"$xsdDir/*.xsd"
]
jvmArgs = ["-Dfile.encoding=UTF-8"]
doFirst {
file(generatedDir).mkdirs()
}
}
compileJava.dependsOn xjcGenerate
sourceSets.main.java.srcDir(generatedDir)
compileJava.options.encoding = 'UTF-8'
And this line is to force the gradle build to not run with non utf8 by default. Add it in the gradle.properties:
org.gradle.jvmargs=-Dfile.encoding=UTF-8
You can try it and tell me if it works otherwise provide the errors log :)